Sunday, January 4, 2009

W3R-4: What's Wrong With Reduce.

We've used fold—our model of reduce—to define sum, product, and a function min3 that finds the three smallest elements in a list. We have not yet used fold to define map, which was one of the functions we used to motivate fold in the first place. Let's do that now1.

With linked lists, it is easy:
def reverse(lnkLst):
def app(acc, x): return Cons(x, acc)
return fold(app, Nil(), lnkLst)

def map_ll(function, lnkLst):
def app(acc, x):
return Cons(function(x), acc)
return reverse(fold(app, Nil(), lnkLst))
I've introduced a reverse function to preserve the order, as we didn't bother to define a right-to-left foldr, just the left-to-right fold that parallels Pythons reduce. Otherwise, the definition is straightforward. The accumulating function needs to create a new linked list by transforming values from the source list. This is a perfect match for fold.

The same idea can be applied to Python's lists (actually arrays):
def map(function, lst):
def app(acc, x):
return acc + [function(x)]
return fold(app, [], lst)
However, this is quite poor. This implementation of map creates a new list at each step, and thus performs very badly—O(N2) time, where N is the length of the list, instead of the O(N) that should be taken. Even though fold—i.e., reduce—is applicable to any iterable, that does not mean it has captured a useful pattern for every iterable!

So what went wrong? Folds are a good way to break down data structures, which led us to iterators, the standard way to take apart data structures in Python. That worked fine, and we really have no reason to expect it to suddenly fail. However, we have introduced something new: we're also using reduce to build up a new data structure—specifically, a new list. This works fine with the functional link list but fails badly with the imperative arrays.

To see this, let's expand the fold out again, using fold_iter (introduced earlier) to do it:
def map_iter(function, lst):
acc = []
for x in iterable:
acc = acc + [function(x)]
return acc
The final form is just not how it should be done. In other words, fold is an abstraction over the wrong pattern!

What has happened is that we've gone about things entirely backwards. In a functional programming language like SML, folds are useful. That usefulness was (apparently) the basis for thinking reduce should be useful in Python. However, folds are useful in functional programming languages because they capture a particular pattern of recursion that is common in functional programming. That pattern of recursion is not common in Python, so we have no reason to conclude that reduce will be useful2.

1 For simplicity, I'll only allow a single list to be given as an argument to map, unlike the Python built-in function.

2Nor may we conclude on logical grounds that reduce will not be useful. Experience suggests it is not particularly so, but we don't know if that is something fundamental about reduce or something incidental, such as the documentation.

No comments: