Tuesday, November 18, 2008

Passive-Aggressive Python

Consider sum. It's great for adding up a list of numbers:
Python 2.5.1 (r251:54863, Feb  4 2008, 21:48:13) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = range(10); print x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> sum(x)
45


Strings, on the other hand, it doesn't care for:
>>> y = list("abcdefghi"); print y
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
>>> sum(y, "")
Traceback (most recent call last):
File "", line 1, in
TypeError: sum() can't sum strings [use ''.join(seq) instead]
This is about performance. Repeatedly adding up strings behaves quite a bit differently than adding up numbers. Each addition causes a new string to be created, turning the whole thing into O(n2), where n is the number of strings being concatenated, instead of the O(n) you would get by using join or when summing up numbers.

Think that seems like a fair tradeoff? Think again:
>>> u = map(lambda a: [a], x); print u
[[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]
>>> sum(u, [])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
and again:
>>> v = map(lambda a: (a,), x); print v
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]
>>> sum(v, ())
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
>>>

Both of those have the same performance issues with sum that strings have.

So much for polymorphism. The interface is muddled, being neither specialized to numbers, nor polymorphic to objects that support addition. It's all quite silly, really, because there's no reason for sum to behave like this. The worst-case time complexity is intended to be O(n), but easily becomes O(n2) as shown above—and actually is entirely unpredictable because objects can make arbitrary calculations. On the other hand, the TypeError raised for a string is an arbitrary restriction, since it could instead just internally make use of ''.join.

Apparently, sum is the way it is because that's just not how one should concatenate a string. Anything else that supports addition, go ahead, but for strings there is one—and only one—right way to do it. By fiat.

Thursday, November 13, 2008

Phrase from nearest book

Odd idea that I've seen first on Grig Gheorghiu's blog. Here are the rules:
  • Grab the nearest book.
  • Open it to page 56.
  • Find the fifth sentence.
  • Post the text of the sentence in your journal along with these instructions.
  • Dont dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.


For me, the nearest book was Practical Statistics by Russell Langley, giving this:

The average duration of numbness in these tests was 50% longer with Prilocaine than with Lignocaine, but the lack of any information about the degree of dispersion around these arithmetic means makes it impossible to assess the significance of the observed difference.

It may not be terribly poetic, but it is a quite profound idea in statistics.

Tuesday, September 16, 2008

My Results for the Sarah Palin Baby Name Generator

The Sarah Palin Baby Name Generator has been mentioned by many sites lately, so a description is probably not needed. My name would be Thump Hummer Palin.

Friday, August 1, 2008

An Anniversary, of Sorts

It has been one year since the O'Reilly MacDevCenter site has published an article on Cocoa. The articles since then are a mix of various Mac OS X tips and tricks, with a heavy emphasis on Aperture—so no actual Mac development articles.

It's been quite a while since O'Reilly has published a book on Mac development, too. There is the Missing Manual series, but it is really something distinct. Has O'Reilly given up on Mac developers?

Sunday, July 27, 2008

AppleScript Syntax: Dealing with the Depressing Reality

Given the preceding two posts, I hope it is clear that I think syntax is of too much concern in AppleScript discussions. The preoccupation with syntax can hide real problems that might actually be sensibly discussed. It seems appropriate to mention some of the aspects of AppleScript that I think could be usefully discussed. I'll limit myself here to points where I can provide some practical tips as starting points.

To begin, name resolution is pretty tricky. Rather than some sensible lexical scoping, there are complex rules. John Gruber provides an excellent description of how names are resolved, in the context of explaining a subtle bug. There is a fact that I've found useful for avoiding name conflicts: you do not have use tell blocks.

Frequently, you just need to tell applications to do a few things. However, example code often has the entire program in a single tell block. That leads to all the tricky rules for determining when a term is looked up in an application dictionary, or a Scripting Addition, or as a handler in the script. Many of these issues can be eliminated by doing something like tell application "Finder" to … instead of using the block format. In essence, this is the same idea as avoiding the use of an implicit this or self object, as seen in some object oriented languages.

Said approach also is particularly enlightening on how much of your program is actually spent dealing with interprocess communication, and how much is program logic. Frequently, not much is actually spent on dealing with IPC, but it leads to a lot of complications. Simple solution: encapsulate IPC in handlers.

Text handling is important in many scripts. AppleScript is pretty weak in its text handling. Mac OS X comes with a powerful text processing system: the Unix shell. Use do shell script for text manipulation.

AppleScript is lacking in the data structures it provides. The one I most frequently miss is an associative array, but it's not the only one. Many other languages provide a richer library of data structures, better support for defining your own, or both. You could often dispense entirely with AppleScript, except for communicating with applications, which might just be a few lines. You can communicate with applications using the osascript shell command. This lets you use just about any language you like, and still be able to gain the main benefit of AppleScript.

There must be many more.

AppleScript Syntax: Some Depressing Examples

Let's expand on the previous post by looking at some examples of how discussion of syntax poisons discussion of AppleScript. Just to be clear, the examples are taken from blogs I like and read regularly; I'm sure there are many other examples, but I haven't gone out of my way to find these.

First, let's return to Daring Fireball. John Gruber recently wrote that
AppleScript, as a programming language, is a noble but failed experiment.
To support this, he links to an earlier article, The English-Likeness Monster, in part of which he makes the far more modest claim that AppleScript's English-like syntax is a failed experiment.

Well, which is it? Languages are more than their syntax. Reading the article doesn't clarify much. Gruber details how a bug he experienced was caused by a subtle name conflict in scripting dictionary terminology. He gives a cogent description of the semantics of name resolution in AppleScript, showing how the name resolution semantics leads to the bug he experienced. However, buried within is a lengthy rant on AppleScript's syntax. It looks like, and is identified as, a digression (well, it is called an "interpolation"—Gruber is a David Foster Wallace fan, as I recall), but gives the post its title. Just what is the intent?

Additionally, there is mention of Python and JavaScript as having clearer, if more abstract, syntax than AppleScript, which helps to prevent such problems. However, if you had AppleScript's syntax but Python's name resolution, you literally could not have the same error. There is a difference at a far deeper level than the syntax. To what extent are these articles supposed to be about syntax, semantics, surrounding tools, libraries (i.e., scripting additions),…?

As a second example, let's take a look at an interesting article from Daniel Jalkut's Red Sweater Blog, called Apple's Script. Jalkut makes what is essentially an economic argument that Apple should make JavaScript the default scripting language for the Mac, keeping AppleScript as an alternative point of access to the Open Scripting Architecture. His key point is that Apple is devoting significant resources to JavaScript, and will continue to do so for strategic reasons. Further, due to wide-spread experience with JavaScript, it is in practice easier for users, despite the supposed ease of use of AppleScript (aside: I dispute that AppleScript is easy to use).

Nothing in the article depends in any way on the syntax of AppleScript, but look through the responses! Several people bring up AppleScript syntax, both as a positive and as a negative. Once the issue of syntax appears, the discussion pretty much stays there. It's a shame, because I think that Jalkut's point was an interesting one, and really does warrant some thought.