Showing posts with label Standard ML. Show all posts
Showing posts with label Standard ML. Show all posts

Friday, January 2, 2009

W3R-2: Importing Reduce Into Python

Last time, we took a look at reduce in Standard ML, where it is known as a fold. Now, we'll translate those folds into Python. Let's keep calling them folds, reserving reduce for describing the existing Python function.

The main challenge in the translation process is that Python doesn't have the right data types—Python's lists are actually arrays, not (linked) lists. As a first step, introduce two classes Cons and Nil to define linked lists:
class Cons(object):

def __init__(self, head, tail):
self.head = head
self.tail = tail

def __iter__(self):
lnkLst = self
while True:
if lnkLst.isNull():
break
yield lnkLst.head
lnkLst = lnkLst.tail

def __repr__(self):
return "Cons(%s, %s)" % (repr(self.head), repr(self.tail))

def isNull(self):
return False

class Nil(object):

def __repr__(self):
return "Nil()"

def __iter__(self):
if False:
yield

def isNull(self):
return True
With these classes, we can construct lists in a Lisp-like fashion:
Cons(0, Cons(1, Cons(2, Cons(3, Cons(4, Nil())))))
For convenience, we'll also introduce a function to convert a Python list into a linked list:
def fromList(lst):
lnkLst = Nil()
for x in lst[::-1]:
lnkLst = Cons(x, lnkLst)
return lnkLst
Using fromList(range(5)) is a lot easier than nesting Cons repeatedly, as shown above.

With linked lists available, it is easy to define a (left) fold:
def fold_recur(function, init, linkedList):
if linkedList.isNull():
result = init
else:
acc = function(init, linkedList.head)
result = fold_recur(function, acc, linkedList.tail)
return result
I'll note two important differences between the SML folds and fold_recur. First, the arguments to the function taken by fold_recur are reversed from those taken by its SML counterpart, in order to match Python's reduce. Second, and far more importantly, fold_recur is terribly broken because it is a tail recursive function, but only has Python's broken support for recursion to draw on. Trying to use fold_recur will cause stack overflows when the list is too long—that is, containing 1000 elements with the default recursion limit1.

To rectify this problem, we need to transform fold_recur by hand into an imperative version. Fortunately, fold_recur is constructed in an iterative form, so the transformation is easy:
def fold_imper(function, init, linkedList):
acc = init
while not linkedList.isNull():
acc = function(acc, linkedList.head)
linkedList = linkedList.tail
return acc
With the imperative definition, our fold will not cause a stack overflow.

Since we've defined linked lists to be iterable, we can still make our fold a bit more elegant:
def fold_ll_iter(function, init, linkedList):
acc = init
for x in linkedList:
acc = function(acc, x)
return acc
Although fold_ll_iter was obtained by iterating over a linked list, it will work with any iterable. Rewrite it to make that clearer:
def fold_iter(function, init, iterable):
acc = init
for x in iterable:
acc = function(acc, x)
return acc
Through this process, the deconstruction of a data structure has been shifted from SML fold functions to Python iterators. This is unsurprising: in Python, it is quite standard to use iterators to deconstruct data structures. It is not completely equivalent, but covers many cases.

The iterator approach suggests another modification of fold, which is normally called scan:
def scan(function, init, iterable):
acc = init
yield acc
for x in iterable:
acc = function(acc, x)
yield acc

Our scan function processes the list in the same sequence, yielding values accumulated at each step along the way. Using scan, we define one last version of fold:
def fold(function, init, iterable):
for x in scan(function, init, iterable): pass
return x


We can use scan and fold to define many useful functions in a simple manner. For example:
import operator
def sum(numbers): return fold(operator.add, 0, numbers)
def cumsum(numbers): return scan(operator.add, 0, numbers)
def product(numbers): return fold(operator.mul, 1, numbers)
def cumproduct(numbers): return scan(operator.mul, 1, numbers)
I think these functions illustrate rather well the difference between fold and scan. In sum, we use fold to total up the numbers in a list, while in cumsum we use scan to generate the partial sums as we proceed through the list. The product and cumproduct functions are similar.

More complex functions are possible as well. Consider finding the three smallest items in a list. This can be done in linear time by iterating through the list and keeping track of the three smallest as we proceed. But that's exactly what fold is for! The accumulator is a collection of the three smallest items, and we just need a function to update that. Putting that together:
def min3(lst):
def comp(acc, x):
a,b,c = acc
if x < a:
result = x,a,b
elif x < b:
result = a,x,b
elif x < c:
result = a,b,x
else:
result = acc
return result
init = sorted(lst[:3])
return fold(comp, init, lst[3:])
For simplicity, I've assumed that min3 takes a list, but it could easily be modified to work with any iterable.

I've presented a lot of program code in this post, transforming a direct translation of the SML folds into function that are defined in a more natural way for Python. Next time, I'll digress briefly to present an alternative structure that has some practical advantages over what's been shown here, then return to examining why reduce doesn't see much use in Python.




1 In practice, this might as well read that the limit is 1000 elements, end of story. The limit can be increased with sys.setrecursionlimit, but "a too-high limit can lead to a crash."

Thursday, January 1, 2009

W3R-1: Origins of Reduce

To begin our investigation of reduce in Python, we need to come to grips with what reduce actually is. I find that the documentation for reduce is rather poor, requiring a fair amount of thinking to determine just what you need to pass in as arguments. Instead, let's examine the origins of reduce to better understand it.

Reduce originates outside of Python. It comes to Python through a contributed patch:
About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches.

The fate of reduce() in Python 3000
Guido van Rossum, 2005
That's not very helpful, so let's derive reduce by abstracting from some examples. Instead of Lisp, I'll use Standard ML. SML has two key advantages over Lisp: the syntax is more approachable for the uninitiated, and its robust (Hindley-Milner) type system and pattern matching will make things a little clearer. I'll follow the definitions in SML rather than Python, so we'll refer to reduce-like functions as folds, with argument order matching those in the Standard ML Basis Library.

Consider adding up a list of integers. In SML, that list is a linked list, so we'll have a typical recursive solution:
fun sum [] = 0
| sum (h::t) = h + sum t
I've made use of pattern matching to define sum in terms of two cases, that of an empty list and of a non-empty list. The double colon is a cons operator, joining the head h and tail t of the list.

To define map, a similar recursion scheme is used:
fun map f [] = []
| map f (h::t) = (f h) :: (map f t)
Map is written as a curried function, so different list processors can be defined by specifying just the function map f, only giving the list later.1

Just about any function over a list uses the same recursion scheme. Let's extract that recursions scheme into a higher order function called foldr:
fun foldr f acc [] = acc
| foldr f acc (h::t) = f(h, (foldr f acc t))
Foldr deconstructs a list from right to left, starting with the tail and accumulating the values. As with map, foldr is a curried function.

The type of foldr can make it easier to understand:
('a * 'b -> 'b) -> 'b -> 'a list -> 'b
The first term, in the parentheses, describes the function f; it takes a list element and an accumulator, returning an updated accumulator. The second argument to foldr is an initial value for the accumulator. Third is the list to process. Importantly, the types of the accumulator and the list elements can differ.

We can use foldr to define sum and map. Here's sum:
val sum = foldr Int.+ 0
For map, we'll introduce a helper function and use foldr on that:
fun map f lst = let
fun maphead(x, acc) = (f x) :: acc
in
foldr maphead [] lst
end
To get the helper function maphead right, I just looked at the types needed for foldr and wrote the obvious function to satisfy those types.

While foldr allows elegant definition of our two example functions, there is a problem. As written, foldr walks through the entire list before starting to do any work. If the list is long, this can cause a stack overflow. We need a tail-recursive version, leading us naturally to foldl:
fun foldl f acc [] = acc
| foldl f acc (h::t) = foldl f (f(h, acc)) t
In contrast to foldr, foldl processes the list from left to right, starting with the head. This definition of foldl is tail recursive, so operates in constant space. It can also be used to define a tail recursive foldr by reversing the list before passing it to foldl.

Foldl is substantially similar to reduce in Python. Next time, I'll translate the above definitions to Python, to make the connection transparent.




1 Having a curried map in SML makes for a big conceptual difference from the map in Python. It's quite reasonable to view map as transforming a function from acting on values to a function acting on lists of values, without map itself actually having anything to do with the list.