Tuesday, November 18, 2008

Passive-Aggressive Python

Consider sum. It's great for adding up a list of numbers:
Python 2.5.1 (r251:54863, Feb  4 2008, 21:48:13) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = range(10); print x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> sum(x)
45


Strings, on the other hand, it doesn't care for:
>>> y = list("abcdefghi"); print y
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
>>> sum(y, "")
Traceback (most recent call last):
File "", line 1, in
TypeError: sum() can't sum strings [use ''.join(seq) instead]
This is about performance. Repeatedly adding up strings behaves quite a bit differently than adding up numbers. Each addition causes a new string to be created, turning the whole thing into O(n2), where n is the number of strings being concatenated, instead of the O(n) you would get by using join or when summing up numbers.

Think that seems like a fair tradeoff? Think again:
>>> u = map(lambda a: [a], x); print u
[[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]
>>> sum(u, [])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
and again:
>>> v = map(lambda a: (a,), x); print v
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]
>>> sum(v, ())
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
>>>

Both of those have the same performance issues with sum that strings have.

So much for polymorphism. The interface is muddled, being neither specialized to numbers, nor polymorphic to objects that support addition. It's all quite silly, really, because there's no reason for sum to behave like this. The worst-case time complexity is intended to be O(n), but easily becomes O(n2) as shown above—and actually is entirely unpredictable because objects can make arbitrary calculations. On the other hand, the TypeError raised for a string is an arbitrary restriction, since it could instead just internally make use of ''.join.

Apparently, sum is the way it is because that's just not how one should concatenate a string. Anything else that supports addition, go ahead, but for strings there is one—and only one—right way to do it. By fiat.

No comments: