Tuesday, November 18, 2008

Passive-Aggressive Python

Consider sum. It's great for adding up a list of numbers:
Python 2.5.1 (r251:54863, Feb  4 2008, 21:48:13) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = range(10); print x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> sum(x)

Strings, on the other hand, it doesn't care for:
>>> y = list("abcdefghi"); print y
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
>>> sum(y, "")
Traceback (most recent call last):
File "", line 1, in
TypeError: sum() can't sum strings [use ''.join(seq) instead]
This is about performance. Repeatedly adding up strings behaves quite a bit differently than adding up numbers. Each addition causes a new string to be created, turning the whole thing into O(n2), where n is the number of strings being concatenated, instead of the O(n) you would get by using join or when summing up numbers.

Think that seems like a fair tradeoff? Think again:
>>> u = map(lambda a: [a], x); print u
[[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]
>>> sum(u, [])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
and again:
>>> v = map(lambda a: (a,), x); print v
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]
>>> sum(v, ())
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

Both of those have the same performance issues with sum that strings have.

So much for polymorphism. The interface is muddled, being neither specialized to numbers, nor polymorphic to objects that support addition. It's all quite silly, really, because there's no reason for sum to behave like this. The worst-case time complexity is intended to be O(n), but easily becomes O(n2) as shown above—and actually is entirely unpredictable because objects can make arbitrary calculations. On the other hand, the TypeError raised for a string is an arbitrary restriction, since it could instead just internally make use of ''.join.

Apparently, sum is the way it is because that's just not how one should concatenate a string. Anything else that supports addition, go ahead, but for strings there is one—and only one—right way to do it. By fiat.

Thursday, November 13, 2008

Phrase from nearest book

Odd idea that I've seen first on Grig Gheorghiu's blog. Here are the rules:
  • Grab the nearest book.
  • Open it to page 56.
  • Find the fifth sentence.
  • Post the text of the sentence in your journal along with these instructions.
  • Dont dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.

For me, the nearest book was Practical Statistics by Russell Langley, giving this:

The average duration of numbness in these tests was 50% longer with Prilocaine than with Lignocaine, but the lack of any information about the degree of dispersion around these arithmetic means makes it impossible to assess the significance of the observed difference.

It may not be terribly poetic, but it is a quite profound idea in statistics.