Wednesday, December 31, 2008

What's Wrong With Reduce?

Functional programming seems to be on the rise—there's even an O'Reilly book on Haskell, now. I'm not quite sure how that increase in popularity happened, but would guess that Paul Graham's advocacy of Lisp had something to do with it, as did Douglas Crockford's work with JavaScript. I also suspect that the higher-order constructs in Python played an important role, allowing imperative programmers to gradually start using a functional style.

However, there is something curious about the functional constructs in Python. One of the higher-order functions in Python—reduce—seems not to find much acceptance. Probably the best example of this is from the BDFL himself, who wrote:
…almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.



The fate of reduce() in Python 3000

Guido van Rossum, 2005

I don't think the above is out of line with most Python users think, but it clashes severely with conventional thinking of users of functional programming languages.

For example, in an influential paper, John Hughes wrote
…a little modularisation can go a long way. By modularising a simple function (sum) as a combination of a “higher order function” and some simple arguments, we have arrived at a part (reduce) that can be used to write down many other functions on lists with no more programming effort.
and
…functional languages allow functions which are indivisible in conventional programming languages to be expressed as a combination of parts—a general higher order function and some particular specialising functions. Once defined, such higher order functions allow many operations to be programmed very easily. Whenever a new datatype is defined higher order functions should be written for processing it. This makes manipulating the datatype easy, and also localises knowledge about the details of its representation. The best analogy with conventional programming is with extensible languages—it is as though the programming language can be extended with new control structures whenever desired.


Why Functional Programming Matters

John Hughes, 1984

My own experience with ML is quite in line with what Hughes wrote: reduce, or foldl as it is also known, is used all the time, and seems very natural.

Why the difference? I'm not aware of any case where GvR has demonstrated much knowledge of functional programming—maybe he was just speaking outside his area of expertise, and got it wrong? The PLT Scheme people may have thought so, satirizing him shortly after in a (quite funny!) April Fool's Day announcement.

However, I think that explanation is overly facile. Personally, while I use folds in functional languages, I rarely do in Python. I think there are real reasons for that, and will examine them in future blog posts.

Wednesday, December 10, 2008

Power Laws and Social Problems

Million-Dollar Murray is a fascinating article by Malcolm Gladwell about how "power laws" (really, heavy tailed distributions) can have surprising policy implications for dealing with social problems.

The article focuses most strongly on homelessness. The core of how power laws clash with our assumptions is summarized as:
That is what is so perplexing about power-law homeless policy. From an economic perspective the approach makes perfect sense. But from a moral perspective it doesn't seem fair. Thousands of people in the Denver area no doubt live day to day, work two or three jobs, and are eminently deserving of a helping hand—and no one offers them the key to a new apartment. […] Social benefits are supposed to have some kind of moral justification. We give them to widows and disabled veterans and poor mothers with small children. Giving the homeless guy passed out on the sidewalk an apartment has a different rationale. It's simply about efficiency.

Another issue considered is police brutality. After the Rodney King beating in Los Angeles, a special commission headed by Warren Christopher investigated the LAPD:
But what was the commission's most memorable observation? It was the story of an officer with a known history of doing things like beating up handcuffed suspects who nonetheless received a performance review from his superior stating that he "usually conducts himself in a manner that inspires respect for the law and instills public confidence." This is what you say about an officer when you haven't actually read his file, and the implication of the Christopher Commission's report was that the L.A.P.D. might help solve its problem simply by getting its police captains to read the files of their officers. The L.A.P.D.'s problem was a matter not of policy but of compliance. The department needed to adhere to the rules it already had in place, and that's not what a public hungry for institutional transformation wants to hear. Solving problems that have power-law distributions doesn't just violate our moral intuitions; it violates our political intuitions as well.

The third issue explored is pollution by automobile exhaust. This seems quite prosaic in comparison to the first two, but may be the best illustration of the way that we can incorrectly address a problem if we have an erroneous view of the distribution of the problem. For cars, we have:
Most cars, especially new ones, are extraordinarily clean. A 2004 Subaru in good working order has an exhaust stream that's just .06 per cent carbon monoxide, which is negligible. But on almost any highway, for whatever reason—age, ill repair, deliberate tampering by the owner—a small number of cars can have carbon-monoxide levels in excess of ten per cent, which is almost two hundred times higher. In Denver, five per cent of the vehicles on the road produce fifty-five per cent of the automobile pollution.
Given this heavy tailed distribution, alternatives to the customary annual inspection become more relevant:
[Donald Stedman, a chemist and automobile-emissions specialist at the University of Denver] proposes mobile testing instead. Twenty years ago, he invented a device the size of a suitcase that uses infrared light to instantly measure and then analyze the emissions of cars as they drive by on the highway. […] He says that cities should put half a dozen or so of his devices in vans, park them on freeway off-ramps around the city, and have a police car poised to pull over anyone who fails the test. A half-dozen vans could test thirty thousand cars a day. For the same twenty-five million dollars that Denver's motorists now spend on on-site testing, Stedman estimates, the city could identify and fix twenty-five thousand truly dirty vehicles every year, and within a few years cut automobile emissions in the Denver metropolitan area by somewhere between thirty-five and forty per cent. The city could stop managing its smog problem and start ending it.
Not mentioned by Gladwell is that this mobile testing would also have a moral benefit: costs could be shifted to polluters, instead of spreading the cost to everyone.

In summary, a fascinating article that connects a current scientific issue with current societal issues. I'm sure it only scratches the surface. No doubt, many other issues depend crucially on the distribution of underlying factors.

Monday, December 8, 2008

Formatted Program Text for Blogger

This blog features a fair amount of code. So far, I've been doing the formatting from SubEthaEdit. Using SEE has worked OK, but was kind of a fiddly procedure. I'd open the desired program, copy as XHTML, paste that into a new document, remove all the lines breaks, copy the modified text, and paste it into MarsEdit. No step is difficult, but automating it is a pain, requiring GUI scripting in AppleScript. Further, it would require running everything through SEE, whether that would make sense or not.

Thus, before scripting the above approach, I looked for other options. Two seem promising: Pygments and highlight. Both provide shell filters for converting program text into HTML. For now, I've settled on highlight, since Pygments lacks support for highlighting Standard ML.

Several command line options are appropriate. For example, I can format an SML file with
cat real-ord.sml | highlight --inline-css -f --enclose-pre -S sml -s seashell | pbcopy 

Pasting that into MarsEdit nicely formats the contents of real-ord.sml:
structure RealOrdKey : ORD_KEY where type ord_key = Real.real = struct
open Real
type ord_key = real
end

I'm sure I'll need to refine the approach a bit as I go, but this already seems easier than the approach I'd been using.

Another Skeptical Take on Python 3

Jens Alfke wonders "What’s The Point?" of Python 3.

Via Daring Fireball.

Friday, December 5, 2008

Thoughts on Python 3.0

Python 3.0 has been released. I first used Python at either the end of 1999 or the beginning of 2000. It's with some surprise that I find myself rather ambivalent about what is a major landmark for a tool that I've used on a near-daily basis for almost nine years.

There are a variety of reasons for my ambivalence. Largely, my own programming interests have diverged from the main features of Python. Thus, the significant changes that are present in Python 3 just aren't addressing the shortcomings of Python that affect me. In particular,
  • My interest in functional programming has continually increased since I first learned about it in 2002. Python is pretty limited in its support for functional programming, especially in its inability to do tail call optimization and its lack of suitable data structures.
  • Learning about functional programming led me to Standard ML and its kin. Reading Harper's Standard ML book profoundly affected how I think about programming, perhaps even more than did reading SICP. Much to my surprise, I found that static typing didn't have to be the nightmare I remembered from C programming. Indeed, typeful programming turns out to be a natural match for how I like to work, with types encoding—and enforcing—a great deal of the assumptions that go into function definition. Python, with its untyped variables, doesn't lend much support to this style.
  • Concurrency appears increasingly relevant. I'd like to learn more about concurrent programming, and apply it in practice. Specifically, message passing concurrency shows great promise, interacting well with a functional programming style and providing ready control over when nondeterminism appears (see, e.g., a recent post on Lambda the Ultimate). Python isn't structured for the approach I want to explore.
Basically, there are things that I want to do that Python just gets in the way of, and the future of Python seems quite unlikely to include the changes I need.

In light of the above, the landmark Python 3 release comes across as a good time to re-assess how I use Python. As a whole, I'm left thinking that I'd be better off directing more of my time to Scala, OCaml, or Haskell. The changes in Python 3 hardly seem worth even the relatively minor effort needed to upgrade at this point, so I think I'll just stick with the installation I have for now.