Applied Abstraction: 2009

Friday, November 6, 2009

Replacement for SubEthaEdit's Command Line Tool

The see command line tool for SubEthaEdit makes scripting needlessly complex. Because it tries to write the contents of the document to stdout upon close of the document, you wind up having to jump through hoops to get sensible behavior. The end result is that it is easy to write an AppleScript where SubEthaEdit calls out to the shell, but hard to have the shell communicate back to SubEthaEdit.

Here, I present an alternative. It is a shell script that uses osascript to open a document in SubEthaEdit. Optionally, a specific line can be given, and, if UI scripting is enabled, the document will be scrolled to show the line.

#!/bin/sh

PROGRAM=$(basename $0)

usage()
{
    echo "Usage: $PROGRAM [-gh] filename"
}

lineGiven=false
while getopts :g:h opt
do
    case $opt in
    g)      lineGiven=true
            lineToShow="$OPTARG"
            ;;
    h)      usage
            exit 0
            ;;
    '?')    echo "$PROGRAM: invalid option -$OPTARG" >&2
            usage >&2
            exit 1
            ;;
    esac
done

shift $((OPTIND - 1))

fileName="$1"

if [ -f "$fileName" ]
then
    Dir="$(dirname "$fileName")"
    Base="$(basename "$fileName")"
    AbsDir="$(cd "$Dir" && pwd -P)"
    AbsPath="$AbsDir/$Base"
else
    echo "Unknown file: $fileName" >&2
    exit 1
fi

echo $AbsPath

/usr/bin/osascript > /dev/null <<ASCPT
    set fileToOpen to POSIX file "$AbsPath"

    if $lineGiven then
        showLine of fileToOpen at ${lineToShow:-0}
    else
        tell application "SubEthaEdit"
            activate
            open fileToOpen
        end tell
    end if

    to showLine of fileToOpen at lineNumber
        tell application "SubEthaEdit"
            activate
            set activeFile to open fileToOpen
            tell activeFile
                set selection to paragraph lineNumber
            end tell
        end tell
        scrollToVisible()
    end showLine

    to scrollToVisible()
        tell application "System Events"
            if UI elements enabled then
                tell process "SubEthaEdit"
                    tell menu bar 1
                        tell menu bar item "Find"
                            tell menu 1
                                click menu item "Jump to Selection"
                            end tell
                        end tell
                    end tell
                end tell
            end if
        end tell
    end scrollToVisible
ASCPT

Monday, August 24, 2009

The Hard Part of Getting Started With Scala

In an interesting interview, Martin Odersky notes:

Clearly it is easiest for a Java or .NET developer to learn Scala. For other communities, the stumbling blocks don't have so much to do with the language itself as with the way we package it and the way the tools are set up, which is Java specific. Once they learn how these things are set up, it should not be hard to learn the language itself.

I'd already come to this conclusion on my own, as someone who has never used Java and is trying to guess at the Java conventions needed for Scala. I think this is typical of newer languages: you can't easily learn them without knowing a specific other language already.

Saturday, May 9, 2009

Hidden Setting for Plaxo

I use Plaxo, mainly for backing up my address book. Plaxo also lets you connect a number of online services to your Plaxo profile and share updates with connected acquaintances, providing a sort of ad hoc online social network.

And Plaxo nags you every time you go to your profile page, asking you to update your status by entering a little text into a field. There doesn't seem to be any way to turn off the status report feature, so you also don't have any way to get it to quit nagging you. But I've found the hidden setting that lets the people linked to you know that you're not using the status reports, and, more importantly, gets Plaxo to quit nagging you.

All you have to do is enter the secret text into the status report field. The secret text reads: "is not providing status reports."

Friday, May 1, 2009

"Hello World" for Numerics

I'm taking a serious look at Scala, and liking what I see very much. It struck me that I have my own version of the standard Hello World program. I recognize, of course, that getting a program to print "Hello World" is not about testing I/O, but instead seeing that you can get a program running at all. Nonetheless, the first test I always make is a different one, that tells me something vital.

I do this:

[~ 479] scala
Welcome to Scala version 2.7.3.final (Java HotSpot(TM) Client VM, Java 1.5.0_16).
Type in expressions to have them evaluated.
Type :help for more information.

scala> 1 + 2*3
res0: Int = 7

If 1+2*3 returns a value other than 7, I automatically reject the language. For completeness, I also check 2*3+1. What programming I do is normally math-intensive enough that I refuse to deal with languages that can't handle normal precedence rules.

Prefix languages like the Lisp family pass this test, as they won't return a value. I've not worked with postfix languages like Forth, but I think they'd pass, too.

Friday, April 10, 2009

Experimenting with Implementing Automator Actions

I've recently experimented with implementing actions for Automator using AppleScript. Overall, it was an interesting experience, and one I'd definitely recommend trying yourself.

Creating an action is generally quite simple, consisting of three main steps:

Laying out controls in Interface Builder and establishing bindings to an NSObjectController,

Defining the properties of the action by filling out some values in plists,

Writing an AppleScript to actually implement the behavior.

I found Apple's documentation and an article by Matt Neuburg from the now-defunct O'Reilly Mac developer center. The latter article is a nice tutorial, but a little dated in spots; some caution is warranted.

As a specific example, I decided to take some of the AppleScript handlers for SubEthaEdit I already had on hand and convert them into actions. After reviewing the possibilities, I decided that there were just two actions that looked sensible: getting text from the front SubEthaEdit document and setting text in the front SubEthaEdit document. These are quite similar to existing actions for TextEdit. Additionally, I decided to expand them a little bit to also allow getting or setting the selection or selected lines in the document, not just the full contents of the document.

The first step was pretty simple. With well-defined, compact actions like the ones I considered, the interface requires little thought and little effort. In my case, it was just a matter of populating the menu entries for a pop-up button and establishing a binding.

To my surprise, I spent the most time and had the most difficulty with the second step. I'd expected this to be easy, but found that there were quite a few details that weren't made especially clear in the description of the keys for the property lists. The analogous actions for TextEdit proved very helpful, here: in several cases, I determined the correct choices by examining the plists from the bundles for the TextEdit actions. I'd expect this step to get easier with practice.

Actually implementing the behavior was especially easy, in this case. As I already had working handlers, I was able to just focus on connecting the action to the handlers. The structure for this is quite pleasant: a record of parameters is injected into the run handler, containing values obtained from the bindings to the NSObjectController. The resulting structure for my actions was then just using if-then statements to interpret the parameters, calling out to different handlers.

Although implementation of the behavior was easy, I did find testing to be quite tedious. It was helpful to save a workflow for testing, as suggested by Neuburg. More importantly than that, I'd strongly recommend that you have working handlers before trying to make them part of your action—test out the handlers in, e.g., Script Editor, and test that your Automator action dispatches to the correct handler.

The actions I created are available, as are the sources.

Wednesday, February 18, 2009

Setting up SkyBlueCanvas for Local Use on Mac OS X Leopard

SkyBlueCanvas (SBC) is a content management system (CMS) for web sites. It can be set up to run on a Mac OS X system using Web Sharing, but some steps need to be taken first. The result is a subfolder of your Sites folder, which shows off pretty much the full features of SBC. Unfortunately, you won't get to this point just by naively following the installation instructions for SBC, because it depends on features not enabled by default in Mac OS X.

Note that I'm not making any recommendations as to whether or not you should use SBC, but am instead just sharing what I had to do to get SBC working. This was driven by my own interest in trying out SBC. I have insufficient experience with CMSs in general or SBC in particular to make an informed judgement.

Several of the steps are multi-step themselves. I've just linked to the websites I found that provided a useful description. Total time to set things up should be under half an hour. After the initial set-up, starting a new site can be done by starting with step four, taking under five minutes.

Enable PHP.

Make sure that your Sites folder is reachable.

Enable .htaccess files.

Copy the SkyBlueCanvas folder to your Sites folder, and give it a reasonable name (I used skyblue, and will assume you do the same).

Change the file permissions of the skyblue folder to 755. You can do this in the Terminal with a command like chmod -R 755 ~username/Site/skyblue—replace username with your actual user name, of course. (I didn't actually find this to be necessary, although it is in the SBC installation instructions.)

Edit htaccess.txt in the skyblue folder, adding a second line of RewriteBase /~username/skyblue/. Rename htaccess.txt to .htaccess.

Change the group and owner for the SBC files. In Terminal, change to the skyblue directory, and run sudo chown -R www:www *. If you don't do this, PHP won't be able to make needed changes to the files.

Start Apache. Open up the Sharing panel in the System Preferences, and check Web Sharing.

Point your web browser at http://localhost/~username/skyblue/ and follow the instructions. When asked for the site URL, give http://localhost/~username/skyblue/.

Saturday, January 17, 2009

W3R-7: Summary and Assessment

In the What's Wrong With Reduce series, I've attempted to resolve contrasting views of the usefulness of higher order function called reduce or fold. By examining the origins of reduce, we saw that it is an abstraction over a particular recursive pattern of control flow. Further, reduce can be readily defined in Python.

However, it is easy to find cases where reduce works well for a functional programming language but poorly in Python. In essence, reduce is an abstraction over a common pattern for functional programming languages that is uncommon in imperative programming languages like Python. Instead, a much narrower selection of Python procedures map well onto the reduce control abstraction.

By examining control flow patterns actually used in Python, a more appropriate higher order procedure¹, app, was identified as an alternative to reduce. Several common control flow patterns cannot be captured by app, so a dual gen procedure was also defined.

Between app and gen, we can define many functions to process or create iterators. However, an important question remains: should we use app or gen? I'd have to say that it isn't worthwhile. Both app and gen are abstractions over imperative patterns, depending crucially on side effects. As a consequence, it is of equal importance to understand the order in which, say, app processes the elements of an iterator. Thus, app fails to provide any conceptual benefit over just using a for loop—in contrast, functional programming with reduce, map, and other higher order functions lets you ignore the order in which things are done.

That said, I use and define higher order procedures all the time in Python. I just define them as (generator) functions using imperative constructs like for or while loops. Also, like the itertools module, my higher order procedures typically take iterators as arguments or return them as results. So long as iterators only need to be used once, this looks a lot like functional programming. Really, it is closer to dataflow programming.

Having an effective alternative to reduce is relatively unimportant for this iterator-intensive programming style. Far more useful would be higher order procedures that simplify composing multiple procedures taking and returning iterators. The overall result should be more like constructing a shell pipeline instead of the extensive nesting of procedure calls currently seen.

¹ For clarity, I distinguish here between higher order functions and higher order procedures. Functions should really be mathematical functions, i.e., no side effects, while procedures are more general.

Thursday, January 8, 2009

W3R-6: Dualing Reduce

By examining actual programs, we saw that reduce is the wrong control abstraction in many cases. By starting from programs instead of starting with the abstractions, we saw that app is a more natural control abstraction for Python. The comparatively few cases where reduce is right were handled by introducing an adaptor Accumulator, with app and Accumulator being sufficient to define reduce.

Unfortunately, the approach of defining an adaptor to get app to do what we want leaves out an extremely common way that for loops get used. Consider defining map in terms of the lazy imap¹. It looks something like:

def map_lazy(function, iterable):
    return list(imap(function, iterable))

def imap(function, iterable):
    for x in iterable:
        yield function(x)

All the real work is now done in imap. However, imap cannot be written using app, since there is no way to pass in a function that will yield values we can use².

In imap, both for and yield express control flow, distinct from that given by app. Similarly, consider defining filter using an ifilter:

def ifilter(predicate, iterable):
    for x in iterable:
        if predicate(x):
            yield x

In ifilter, we have another pattern of control flow, regulated with an if statement. Thus, we need an additional higher order procedures if we want to abstract over these patterns. It should be, informally, a dual to app, creating an iterator instead of consuming one.

I don't know of a standard name for the dual procedure—it's a generator (probably confusing to call it that) and has a great deal of similarity to unfolds. At the risk of some confusion, I'll go with the short name of gen. A possibility for writing it is:

def gen(init, update, shouldEmit, emit):
    state = init
    while True:
        if shouldEmit(state):
            yield emit(state)
        update(state)

Note that I've included no explicit treatment of the internal state, so gen can in principle handle any sort of internal state, be it an iterator or otherwise.

I don't think I've seen anything like gen in Python code. Instead, generator functions are written out explicitly on a case-by-case basis. Looking at how we'd implement imap with gen suggests why:

def imap_gen(function, iterable):
    def update(state):
        it = state[2]
        state[0] = it.next()
        state[1] = True
    def emit(state): return function(state[0])
    def shouldEmit(state): return state[1]
    init = [None, False, iter(iterable)]
    return gen(init, update, shouldEmit, emit)

There is quite a lot of effort needed to convert iterable into something that we can manage with gen. A relatively simple class could be given to initialize and update the iterator state, paralleling the approach of app plus an adaptor.

Perhaps there is an alternative to gen that would be more natural for iterators as the internal state, while also cleanly handling non-iterable states. However, I'm not going to pursue the matter further, as gen demonstrates how to encode another pattern of control complementary to that encoded by app (or reduce). Taken together, app and gen can accomplish a great deal, creating an iterator with gen and processing its outputs using app.

¹ Arguably, we don't need map at all when we have imap. This is the case in Python 3, where map returns an iterator.

² This is a limitation of Python's coroutines. More general (and more complicated) routines, such as those in Lua, do allow such an implementation.

Tuesday, January 6, 2009

W3R-5: Cart Follows Horse

Higher order functions can be used to implement control abstractions, but that doesn't mean all higher order functions provide useful control abstractions. As we've seen, a useful control abstraction in one language may not work so well in another. In SML, folds are frequently used, while the analogous reduce is uncommon in Python. In retrospect, this doesn't seem so surprising: SML is principally a functional programming language depending on declarative (i.e., recursive) control flow, while Python is a hybrid procedural and object-oriented programming depending on imperative control flow. Where the two languages overlap is where reduce becomes useful, but that overlap is relatively small.

If reduce abstracts over the wrong pattern of control flow, then what are the correct patterns for Python? Let's return to map and see what we can conclude. An imperative definition of map is natural, along the lines of:

def map_imper(function, iterable):
    acc = []
    for x in iterable:
        acc.append(function(x))
    return acc

This version of map utilizes a typical imperative pattern of control, consisting of mutating an object based on values encountered as we iterate though a collection using a for loop. Python has no higher order procedure capturing this imperative pattern; there is such a higher order procedure for Standard ML, where it is called app.

Let's define app for Python. It's easy:

def app(mutate, iterable):
    for x in iterable:
        mutate(x)

Note that app has no useful return value, always returning None, emphasizing that it is always called for its side effects. Using app to define map, we obtain

def map_app(function, iterable):
    acc = []
    def mutate(x): acc.append(function(x))
    app(mutate, iterable)
    return acc

The definition of map_app is no simpler than that of map_imper, but does illustrate an appropriate control abstraction.

Although not needed for map, several generalizations of app could be made. One possibility would be to include optional arguments in order to allow slicing of the iterable:

import itertools
def app(mutate, iterable, start=None, stop=None, step=None):
    if start is not None or stop is not None or step is not None:
        iterable = itertools.islice(iterable, start, stop, step)
    for x in iterable:
        mutate(x)

Many other variations are possible. I'll not suggest which is correct, as that is a question best addressed by empirical analysis of existing code bases.

Now let's reconsider our other motivating example, sum. An imperative implementation is quite similar to that for map:

def sum_imper(numbers):
    acc = 0
    for x in numbers:
        acc += x
    return x

This is the same pattern of control as used in map, mutation within a for loop, but it cannot be expressed using app. In sum, the mutation is that of the variable acc. We can't pass in a variable to app and expect it to be modified, as only the local variable would be changed. That is, app mutates values, while sum mutates a variable.

I can think of two solutions. First, we could introduce another higher order procedure to abstract over what happens in sum. That procedure—function, in this case—is just reduce. Second, we could adapt the value from an immutable number to some mutable object. Which to choose is pretty much a matter of taste. Having two substantially similar control flow mechanisms may be undesirable, but reduce is also a known quantity. The adaptor approach is not as well explored, but may turn out to be more natural. Let's take a look.

When combined with app, the adaptor needs to produce the same effect as reduce. Since app is handling the control flow, the adaptor just needs to handle the accumulation. One possible definition is to use an Accumulator class, defined as:

class Accumulator(object):
    def __init__(self, function, init):
        self.function = function
        self.value = init

    def update(self, x):
        self.value = self.function(self.value, x)

As an example of usage, here is an accumulator for summation:

>>> acc_sum = Accumulator(operator.add, 0)
>>> acc_sum.update(4)
>>> acc_sum.value
4
>>> acc_sum.update(37)
>>> acc_sum.value
41

Putting together an Accumulator instance and app, we can give another definition of sum:

def sum_adapt(numbers):
    acc = Accumulator(operator.add, 0)
    app(acc.update, numbers)
    return acc.value

As a second example, here is yet another definition of a fold:

def fold_app(function, init, iterable):
    acc = Accumulator(function, init)
    app(acc.update, iterable)
    return acc.value

Personally, I find these definitions to be quite natural and appealing; app and Accumulator would fit in well with my programming style.

Sunday, January 4, 2009

W3R-4: What's Wrong With Reduce.

We've used fold—our model of reduce—to define sum, product, and a function min3 that finds the three smallest elements in a list. We have not yet used fold to define map, which was one of the functions we used to motivate fold in the first place. Let's do that now¹.

With linked lists, it is easy:

def reverse(lnkLst):
    def app(acc, x): return Cons(x, acc)
    return fold(app, Nil(), lnkLst)

def map_ll(function, lnkLst):
    def app(acc, x):
        return Cons(function(x), acc)
    return reverse(fold(app, Nil(), lnkLst))

I've introduced a reverse function to preserve the order, as we didn't bother to define a right-to-left foldr, just the left-to-right fold that parallels Pythons reduce. Otherwise, the definition is straightforward. The accumulating function needs to create a new linked list by transforming values from the source list. This is a perfect match for fold.

The same idea can be applied to Python's lists (actually arrays):

def map(function, lst):
    def app(acc, x):
        return acc + [function(x)]
    return fold(app, [], lst)

However, this is quite poor. This implementation of map creates a new list at each step, and thus performs very badly—O(N²) time, where N is the length of the list, instead of the O(N) that should be taken. Even though fold—i.e., reduce—is applicable to any iterable, that does not mean it has captured a useful pattern for every iterable!

So what went wrong? Folds are a good way to break down data structures, which led us to iterators, the standard way to take apart data structures in Python. That worked fine, and we really have no reason to expect it to suddenly fail. However, we have introduced something new: we're also using reduce to build up a new data structure—specifically, a new list. This works fine with the functional link list but fails badly with the imperative arrays.

To see this, let's expand the fold out again, using fold_iter (introduced earlier) to do it:

def map_iter(function, lst):
    acc = []
    for x in iterable:
        acc = acc + [function(x)]
    return acc

The final form is just not how it should be done. In other words, fold is an abstraction over the wrong pattern!

What has happened is that we've gone about things entirely backwards. In a functional programming language like SML, folds are useful. That usefulness was (apparently) the basis for thinking reduce should be useful in Python. However, folds are useful in functional programming languages because they capture a particular pattern of recursion that is common in functional programming. That pattern of recursion is not common in Python, so we have no reason to conclude that reduce will be useful².

¹ For simplicity, I'll only allow a single list to be given as an argument to map, unlike the Python built-in function.

²Nor may we conclude on logical grounds that reduce will not be useful. Experience suggests it is not particularly so, but we don't know if that is something fundamental about reduce or something incidental, such as the documentation.

W3R-3: Object-Oriented Reduce

We now have worked through deriving reduce in Python. In doing so, we have defined two closely related functions, fold¹ and scan. Since they're so similar, it makes sense to briefly digress and consider bundling them together into a class. I won't make much use of the class in future installments of this series, but it is an interesting exercise to consider alternative definitions that might be a better fit to Python.

Define:

class Fold(object):

    def __init__(self, function, init):
        self.function = function
        self.init = init

    def __call__(self, acc, x):
        return self.function(acc, x)

    def scan(self, iterable):
        function = self.function
        acc = self.init
        yield acc
        for x in iterable:
            acc = function(acc, x)
            yield acc

    def fold(self, iterable):
        for x in self.scan(iterable): pass
        return x

The Fold class is defined to take the function and initial value for the accumulator when the class is instantiated. This gives us some of the benefits we saw for the curried SML functions.

Putting the Fold class to use, we can define the sum and cumsum functions that we saw last time:

sum = Fold(operator.add, 0).fold
cumsum = Fold(operator.add, 0).scan

Perhaps more useful is to define something like<

add = Fold(operator.add, 0)

and then using add.scan and add.fold as needed.

Indeed, any function of two arguments could be given this treatment, so long as a sensible default (or identity) value can be given. Note that the min3 function from last time is not well suited to this sort of treatment, because it has no clear default value. However, the Fold class presents no additional difficulties over separate fold and scan functions.

¹ As a reminder, fold is another name for reduce. We'll continue to reserve reduce as the name of the Python built-in function.

Friday, January 2, 2009

W3R-2: Importing Reduce Into Python

Last time, we took a look at reduce in Standard ML, where it is known as a fold. Now, we'll translate those folds into Python. Let's keep calling them folds, reserving reduce for describing the existing Python function.

The main challenge in the translation process is that Python doesn't have the right data types—Python's lists are actually arrays, not (linked) lists. As a first step, introduce two classes Cons and Nil to define linked lists:

class Cons(object):

    def __init__(self, head, tail):
        self.head = head
        self.tail = tail

    def __iter__(self):
        lnkLst = self
        while True:
            if lnkLst.isNull():
                break
            yield lnkLst.head
            lnkLst = lnkLst.tail

    def __repr__(self):
        return "Cons(%s, %s)" % (repr(self.head), repr(self.tail))

    def isNull(self):
        return False

class Nil(object):

    def __repr__(self):
        return "Nil()"

    def __iter__(self):
        if False:
            yield

    def isNull(self):
        return True

With these classes, we can construct lists in a Lisp-like fashion:

Cons(0, Cons(1, Cons(2, Cons(3, Cons(4, Nil())))))

For convenience, we'll also introduce a function to convert a Python list into a linked list:

def fromList(lst):
    lnkLst = Nil()
    for x in lst[::-1]:
        lnkLst = Cons(x, lnkLst)
    return lnkLst

Using fromList(range(5)) is a lot easier than nesting Cons repeatedly, as shown above.

With linked lists available, it is easy to define a (left) fold:

def fold_recur(function, init, linkedList):
    if linkedList.isNull():
        result = init
    else:
        acc = function(init, linkedList.head)
        result = fold_recur(function, acc, linkedList.tail)
    return result

I'll note two important differences between the SML folds and fold_recur. First, the arguments to the function taken by fold_recur are reversed from those taken by its SML counterpart, in order to match Python's reduce. Second, and far more importantly, fold_recur is terribly broken because it is a tail recursive function, but only has Python's broken support for recursion to draw on. Trying to use fold_recur will cause stack overflows when the list is too long—that is, containing 1000 elements with the default recursion limit¹.

To rectify this problem, we need to transform fold_recur by hand into an imperative version. Fortunately, fold_recur is constructed in an iterative form, so the transformation is easy:

def fold_imper(function, init, linkedList):
    acc = init
    while not linkedList.isNull():
        acc = function(acc, linkedList.head)
        linkedList = linkedList.tail
    return acc

With the imperative definition, our fold will not cause a stack overflow.

Since we've defined linked lists to be iterable, we can still make our fold a bit more elegant:

def fold_ll_iter(function, init, linkedList):
    acc = init
    for x in linkedList:
        acc = function(acc, x)
    return acc

Although fold_ll_iter was obtained by iterating over a linked list, it will work with any iterable. Rewrite it to make that clearer:

def fold_iter(function, init, iterable):
    acc = init
    for x in iterable:
        acc = function(acc, x)
    return acc

Through this process, the deconstruction of a data structure has been shifted from SML fold functions to Python iterators. This is unsurprising: in Python, it is quite standard to use iterators to deconstruct data structures. It is not completely equivalent, but covers many cases.

The iterator approach suggests another modification of fold, which is normally called scan:

def scan(function, init, iterable):
    acc = init
    yield acc
    for x in iterable:
        acc = function(acc, x)
        yield acc

Our scan function processes the list in the same sequence, yielding values accumulated at each step along the way. Using scan, we define one last version of fold:

def fold(function, init, iterable):
    for x in scan(function, init, iterable): pass
    return x

We can use scan and fold to define many useful functions in a simple manner. For example:

import operator
def sum(numbers): return fold(operator.add, 0, numbers)
def cumsum(numbers): return scan(operator.add, 0, numbers)
def product(numbers): return fold(operator.mul, 1, numbers)
def cumproduct(numbers): return scan(operator.mul, 1, numbers)

I think these functions illustrate rather well the difference between fold and scan. In sum, we use fold to total up the numbers in a list, while in cumsum we use scan to generate the partial sums as we proceed through the list. The product and cumproduct functions are similar.

More complex functions are possible as well. Consider finding the three smallest items in a list. This can be done in linear time by iterating through the list and keeping track of the three smallest as we proceed. But that's exactly what fold is for! The accumulator is a collection of the three smallest items, and we just need a function to update that. Putting that together:

def min3(lst):
    def comp(acc, x):
        a,b,c = acc
        if x < a:
            result = x,a,b
        elif x < b:
            result = a,x,b
        elif x < c:
            result = a,b,x
        else:
            result = acc
        return result
    init = sorted(lst[:3])
    return fold(comp, init, lst[3:])

For simplicity, I've assumed that min3 takes a list, but it could easily be modified to work with any iterable.

I've presented a lot of program code in this post, transforming a direct translation of the SML folds into function that are defined in a more natural way for Python. Next time, I'll digress briefly to present an alternative structure that has some practical advantages over what's been shown here, then return to examining why reduce doesn't see much use in Python.

¹ In practice, this might as well read that the limit is 1000 elements, end of story. The limit can be increased with sys.setrecursionlimit, but "a too-high limit can lead to a crash."

Thursday, January 1, 2009

W3R-1: Origins of Reduce

To begin our investigation of reduce in Python, we need to come to grips with what reduce actually is. I find that the documentation for reduce is rather poor, requiring a fair amount of thinking to determine just what you need to pass in as arguments. Instead, let's examine the origins of reduce to better understand it.

Reduce originates outside of Python. It comes to Python through a contributed patch:

About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches.

The fate of reduce() in Python 3000
Guido van Rossum, 2005

That's not very helpful, so let's derive reduce by abstracting from some examples. Instead of Lisp, I'll use Standard ML. SML has two key advantages over Lisp: the syntax is more approachable for the uninitiated, and its robust (Hindley-Milner) type system and pattern matching will make things a little clearer. I'll follow the definitions in SML rather than Python, so we'll refer to reduce-like functions as folds, with argument order matching those in the Standard ML Basis Library.

Consider adding up a list of integers. In SML, that list is a linked list, so we'll have a typical recursive solution:

fun sum [] = 0
  | sum (h::t) = h + sum t

I've made use of pattern matching to define sum in terms of two cases, that of an empty list and of a non-empty list. The double colon is a cons operator, joining the head h and tail t of the list.

To define map, a similar recursion scheme is used:

fun map f [] = []
  | map f (h::t) = (f h) :: (map f t)

Map is written as a curried function, so different list processors can be defined by specifying just the function map f, only giving the list later.¹

Just about any function over a list uses the same recursion scheme. Let's extract that recursions scheme into a higher order function called foldr:

fun foldr f acc [] = acc
  | foldr f acc (h::t) = f(h, (foldr f acc t))

Foldr deconstructs a list from right to left, starting with the tail and accumulating the values. As with map, foldr is a curried function.

The type of foldr can make it easier to understand:

('a * 'b -> 'b) -> 'b -> 'a list -> 'b

The first term, in the parentheses, describes the function f; it takes a list element and an accumulator, returning an updated accumulator. The second argument to foldr is an initial value for the accumulator. Third is the list to process. Importantly, the types of the accumulator and the list elements can differ.

We can use foldr to define sum and map. Here's sum:

val sum = foldr Int.+ 0

For map, we'll introduce a helper function and use foldr on that:

fun map f lst = let
  fun maphead(x, acc) = (f x) :: acc
in
  foldr maphead [] lst
end

To get the helper function maphead right, I just looked at the types needed for foldr and wrote the obvious function to satisfy those types.

While foldr allows elegant definition of our two example functions, there is a problem. As written, foldr walks through the entire list before starting to do any work. If the list is long, this can cause a stack overflow. We need a tail-recursive version, leading us naturally to foldl:

fun foldl f acc [] = acc
  | foldl f acc (h::t) = foldl f (f(h, acc)) t

In contrast to foldr, foldl processes the list from left to right, starting with the head. This definition of foldl is tail recursive, so operates in constant space. It can also be used to define a tail recursive foldr by reversing the list before passing it to foldl.

Foldl is substantially similar to reduce in Python. Next time, I'll translate the above definitions to Python, to make the connection transparent.

¹ Having a curried map in SML makes for a big conceptual difference from the map in Python. It's quite reasonable to view map as transforming a function from acting on values to a function acting on lists of values, without map itself actually having anything to do with the list.

Applied Abstraction