Last time, we took a look at

`reduce`

in Standard ML, where it is known as a

fold. Now, we'll translate those folds into Python. Let's keep calling them folds, reserving reduce for describing the existing Python function.

The main challenge in the translation process is that Python doesn't have the right data types—Python's lists are actually arrays, not (linked) lists. As a first step, introduce two classes

`Cons`

and

`Nil`

to define linked lists:

class Cons(object):

def __init__(self, head, tail):

self.head = head

self.tail = tail

def __iter__(self):

lnkLst = self

while True:

if lnkLst.isNull():

break

yield lnkLst.head

lnkLst = lnkLst.tail

def __repr__(self):

return "Cons(%s, %s)" % (repr(self.head), repr(self.tail))

def isNull(self):

return False

class Nil(object):

def __repr__(self):

return "Nil()"

def __iter__(self):

if False:

yield

def isNull(self):

return True

With these classes, we can construct lists in a Lisp-like fashion:

Cons(0, Cons(1, Cons(2, Cons(3, Cons(4, Nil())))))

For convenience, we'll also introduce a function to convert a Python list into a linked list:

def fromList(lst):

lnkLst = Nil()

for x in lst[::-1]:

lnkLst = Cons(x, lnkLst)

return lnkLst

Using

`fromList(range(5))`

is a lot easier than nesting

`Cons`

repeatedly, as shown above.

With linked lists available, it is easy to define a (left) fold:

def fold_recur(function, init, linkedList):

if linkedList.isNull():

result = init

else:

acc = function(init, linkedList.head)

result = fold_recur(function, acc, linkedList.tail)

return result

I'll note two important differences between the

SML folds and

`fold_recur`

. First, the arguments to the

`function`

taken by

`fold_recur`

are reversed from those taken by its SML counterpart, in order to match Python's

`reduce`

. Second, and far more importantly,

`fold_recur`

is terribly broken because it is a

tail recursive function, but only has Python's broken support for recursion to draw on. Trying to use

`fold_recur`

will cause stack overflows when the list is too long—that is, containing 1000 elements with the default recursion limit

^{1}.

To rectify this problem, we need to transform

`fold_recur`

by hand into an imperative version. Fortunately,

`fold_recur`

is constructed in an iterative form, so the transformation is easy:

def fold_imper(function, init, linkedList):

acc = init

while not linkedList.isNull():

acc = function(acc, linkedList.head)

linkedList = linkedList.tail

return acc

With the imperative definition, our fold will not cause a stack overflow.

Since we've defined linked lists to be iterable, we can still make our fold a bit more elegant:

def fold_ll_iter(function, init, linkedList):

acc = init

for x in linkedList:

acc = function(acc, x)

return acc

Although

`fold_ll_iter`

was obtained by iterating over a linked list, it will work with any iterable. Rewrite it to make that clearer:

def fold_iter(function, init, iterable):

acc = init

for x in iterable:

acc = function(acc, x)

return acc

Through this process, the deconstruction of a data structure has been shifted from SML fold functions to Python iterators. This is unsurprising: in Python, it is quite standard to use iterators to deconstruct data structures. It is not completely equivalent, but covers many cases.

The iterator approach suggests another modification of fold, which is normally called

scan:

def scan(function, init, iterable):

acc = init

yield acc

for x in iterable:

acc = function(acc, x)

yield acc

Our

`scan`

function processes the list in the same sequence, yielding values accumulated at each step along the way. Using

`scan`

, we define one last version of

`fold`

:

def fold(function, init, iterable):

for x in scan(function, init, iterable): pass

return x

We can use

`scan`

and

`fold`

to define many useful functions in a simple manner. For example:

import operator

def sum(numbers): return fold(operator.add, 0, numbers)

def cumsum(numbers): return scan(operator.add, 0, numbers)

def product(numbers): return fold(operator.mul, 1, numbers)

def cumproduct(numbers): return scan(operator.mul, 1, numbers)

I think these functions illustrate rather well the difference between

`fold`

and

`scan`

. In

`sum`

, we use

`fold`

to total up the numbers in a list, while in

`cumsum`

we use

`scan`

to generate the partial sums as we proceed through the list. The

`product`

and

`cumproduct`

functions are similar.

More complex functions are possible as well. Consider finding the three smallest items in a list. This can be done in linear time by iterating through the list and keeping track of the three smallest as we proceed. But that's exactly what

`fold`

is for! The accumulator is a collection of the three smallest items, and we just need a function to update that. Putting that together:

def min3(lst):

def comp(acc, x):

a,b,c = acc

if x < a:

result = x,a,b

elif x < b:

result = a,x,b

elif x < c:

result = a,b,x

else:

result = acc

return result

init = sorted(lst[:3])

return fold(comp, init, lst[3:])

For simplicity, I've assumed that

`min3`

takes a list, but it could easily be modified to work with any iterable.

I've presented a lot of program code in this post, transforming a direct translation of the SML folds into function that are defined in a more natural way for Python. Next time, I'll digress briefly to present an alternative structure that has some practical advantages over what's been shown here, then return to examining why

`reduce`

doesn't see much use in Python.

^{1} In practice, this might as well read that the limit is 1000 elements, end of story. The limit can be increased with

`sys.setrecursionlimit`

, but "

a too-high limit can lead to a crash."