Thursday, January 8, 2009

W3R-6: Dualing Reduce

By examining actual programs, we saw that reduce is the wrong control abstraction in many cases. By starting from programs instead of starting with the abstractions, we saw that app is a more natural control abstraction for Python. The comparatively few cases where reduce is right were handled by introducing an adaptor Accumulator, with app and Accumulator being sufficient to define reduce.

Unfortunately, the approach of defining an adaptor to get app to do what we want leaves out an extremely common way that for loops get used. Consider defining map in terms of the lazy imap1. It looks something like:
def map_lazy(function, iterable):
return list(imap(function, iterable))

def imap(function, iterable):
for x in iterable:
yield function(x)
All the real work is now done in imap. However, imap cannot be written using app, since there is no way to pass in a function that will yield values we can use2.

In imap, both for and yield express control flow, distinct from that given by app. Similarly, consider defining filter using an ifilter:
def ifilter(predicate, iterable):
for x in iterable:
if predicate(x):
yield x
In ifilter, we have another pattern of control flow, regulated with an if statement. Thus, we need an additional higher order procedures if we want to abstract over these patterns. It should be, informally, a dual to app, creating an iterator instead of consuming one.

I don't know of a standard name for the dual procedure—it's a generator (probably confusing to call it that) and has a great deal of similarity to unfolds. At the risk of some confusion, I'll go with the short name of gen. A possibility for writing it is:
def gen(init, update, shouldEmit, emit):
state = init
while True:
if shouldEmit(state):
yield emit(state)
update(state)
Note that I've included no explicit treatment of the internal state, so gen can in principle handle any sort of internal state, be it an iterator or otherwise.

I don't think I've seen anything like gen in Python code. Instead, generator functions are written out explicitly on a case-by-case basis. Looking at how we'd implement imap with gen suggests why:
def imap_gen(function, iterable):
def update(state):
it = state[2]
state[0] = it.next()
state[1] = True
def emit(state): return function(state[0])
def shouldEmit(state): return state[1]
init = [None, False, iter(iterable)]
return gen(init, update, shouldEmit, emit)
There is quite a lot of effort needed to convert iterable into something that we can manage with gen. A relatively simple class could be given to initialize and update the iterator state, paralleling the approach of app plus an adaptor.

Perhaps there is an alternative to gen that would be more natural for iterators as the internal state, while also cleanly handling non-iterable states. However, I'm not going to pursue the matter further, as gen demonstrates how to encode another pattern of control complementary to that encoded by app (or reduce). Taken together, app and gen can accomplish a great deal, creating an iterator with gen and processing its outputs using app.






1 Arguably, we don't need map at all when we have imap. This is the case in Python 3, where map returns an iterator.

2 This is a limitation of Python's coroutines. More general (and more complicated) routines, such as those in Lua, do allow such an implementation.

No comments: