Wednesday, December 31, 2008

What's Wrong With Reduce?

Functional programming seems to be on the rise—there's even an O'Reilly book on Haskell, now. I'm not quite sure how that increase in popularity happened, but would guess that Paul Graham's advocacy of Lisp had something to do with it, as did Douglas Crockford's work with JavaScript. I also suspect that the higher-order constructs in Python played an important role, allowing imperative programmers to gradually start using a functional style.

However, there is something curious about the functional constructs in Python. One of the higher-order functions in Python—reduce—seems not to find much acceptance. Probably the best example of this is from the BDFL himself, who wrote:
…almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.

The fate of reduce() in Python 3000

Guido van Rossum, 2005

I don't think the above is out of line with most Python users think, but it clashes severely with conventional thinking of users of functional programming languages.

For example, in an influential paper, John Hughes wrote
…a little modularisation can go a long way. By modularising a simple function (sum) as a combination of a “higher order function” and some simple arguments, we have arrived at a part (reduce) that can be used to write down many other functions on lists with no more programming effort.
…functional languages allow functions which are indivisible in conventional programming languages to be expressed as a combination of parts—a general higher order function and some particular specialising functions. Once defined, such higher order functions allow many operations to be programmed very easily. Whenever a new datatype is defined higher order functions should be written for processing it. This makes manipulating the datatype easy, and also localises knowledge about the details of its representation. The best analogy with conventional programming is with extensible languages—it is as though the programming language can be extended with new control structures whenever desired.

Why Functional Programming Matters

John Hughes, 1984

My own experience with ML is quite in line with what Hughes wrote: reduce, or foldl as it is also known, is used all the time, and seems very natural.

Why the difference? I'm not aware of any case where GvR has demonstrated much knowledge of functional programming—maybe he was just speaking outside his area of expertise, and got it wrong? The PLT Scheme people may have thought so, satirizing him shortly after in a (quite funny!) April Fool's Day announcement.

However, I think that explanation is overly facile. Personally, while I use folds in functional languages, I rarely do in Python. I think there are real reasons for that, and will examine them in future blog posts.

Wednesday, December 10, 2008

Power Laws and Social Problems

Million-Dollar Murray is a fascinating article by Malcolm Gladwell about how "power laws" (really, heavy tailed distributions) can have surprising policy implications for dealing with social problems.

The article focuses most strongly on homelessness. The core of how power laws clash with our assumptions is summarized as:
That is what is so perplexing about power-law homeless policy. From an economic perspective the approach makes perfect sense. But from a moral perspective it doesn't seem fair. Thousands of people in the Denver area no doubt live day to day, work two or three jobs, and are eminently deserving of a helping hand—and no one offers them the key to a new apartment. […] Social benefits are supposed to have some kind of moral justification. We give them to widows and disabled veterans and poor mothers with small children. Giving the homeless guy passed out on the sidewalk an apartment has a different rationale. It's simply about efficiency.

Another issue considered is police brutality. After the Rodney King beating in Los Angeles, a special commission headed by Warren Christopher investigated the LAPD:
But what was the commission's most memorable observation? It was the story of an officer with a known history of doing things like beating up handcuffed suspects who nonetheless received a performance review from his superior stating that he "usually conducts himself in a manner that inspires respect for the law and instills public confidence." This is what you say about an officer when you haven't actually read his file, and the implication of the Christopher Commission's report was that the L.A.P.D. might help solve its problem simply by getting its police captains to read the files of their officers. The L.A.P.D.'s problem was a matter not of policy but of compliance. The department needed to adhere to the rules it already had in place, and that's not what a public hungry for institutional transformation wants to hear. Solving problems that have power-law distributions doesn't just violate our moral intuitions; it violates our political intuitions as well.

The third issue explored is pollution by automobile exhaust. This seems quite prosaic in comparison to the first two, but may be the best illustration of the way that we can incorrectly address a problem if we have an erroneous view of the distribution of the problem. For cars, we have:
Most cars, especially new ones, are extraordinarily clean. A 2004 Subaru in good working order has an exhaust stream that's just .06 per cent carbon monoxide, which is negligible. But on almost any highway, for whatever reason—age, ill repair, deliberate tampering by the owner—a small number of cars can have carbon-monoxide levels in excess of ten per cent, which is almost two hundred times higher. In Denver, five per cent of the vehicles on the road produce fifty-five per cent of the automobile pollution.
Given this heavy tailed distribution, alternatives to the customary annual inspection become more relevant:
[Donald Stedman, a chemist and automobile-emissions specialist at the University of Denver] proposes mobile testing instead. Twenty years ago, he invented a device the size of a suitcase that uses infrared light to instantly measure and then analyze the emissions of cars as they drive by on the highway. […] He says that cities should put half a dozen or so of his devices in vans, park them on freeway off-ramps around the city, and have a police car poised to pull over anyone who fails the test. A half-dozen vans could test thirty thousand cars a day. For the same twenty-five million dollars that Denver's motorists now spend on on-site testing, Stedman estimates, the city could identify and fix twenty-five thousand truly dirty vehicles every year, and within a few years cut automobile emissions in the Denver metropolitan area by somewhere between thirty-five and forty per cent. The city could stop managing its smog problem and start ending it.
Not mentioned by Gladwell is that this mobile testing would also have a moral benefit: costs could be shifted to polluters, instead of spreading the cost to everyone.

In summary, a fascinating article that connects a current scientific issue with current societal issues. I'm sure it only scratches the surface. No doubt, many other issues depend crucially on the distribution of underlying factors.

Monday, December 8, 2008

Formatted Program Text for Blogger

This blog features a fair amount of code. So far, I've been doing the formatting from SubEthaEdit. Using SEE has worked OK, but was kind of a fiddly procedure. I'd open the desired program, copy as XHTML, paste that into a new document, remove all the lines breaks, copy the modified text, and paste it into MarsEdit. No step is difficult, but automating it is a pain, requiring GUI scripting in AppleScript. Further, it would require running everything through SEE, whether that would make sense or not.

Thus, before scripting the above approach, I looked for other options. Two seem promising: Pygments and highlight. Both provide shell filters for converting program text into HTML. For now, I've settled on highlight, since Pygments lacks support for highlighting Standard ML.

Several command line options are appropriate. For example, I can format an SML file with
cat real-ord.sml | highlight --inline-css -f --enclose-pre -S sml -s seashell | pbcopy 

Pasting that into MarsEdit nicely formats the contents of real-ord.sml:
structure RealOrdKey : ORD_KEY where type ord_key = Real.real = struct
open Real
type ord_key = real

I'm sure I'll need to refine the approach a bit as I go, but this already seems easier than the approach I'd been using.

Another Skeptical Take on Python 3

Jens Alfke wonders "What’s The Point?" of Python 3.

Via Daring Fireball.

Friday, December 5, 2008

Thoughts on Python 3.0

Python 3.0 has been released. I first used Python at either the end of 1999 or the beginning of 2000. It's with some surprise that I find myself rather ambivalent about what is a major landmark for a tool that I've used on a near-daily basis for almost nine years.

There are a variety of reasons for my ambivalence. Largely, my own programming interests have diverged from the main features of Python. Thus, the significant changes that are present in Python 3 just aren't addressing the shortcomings of Python that affect me. In particular,
  • My interest in functional programming has continually increased since I first learned about it in 2002. Python is pretty limited in its support for functional programming, especially in its inability to do tail call optimization and its lack of suitable data structures.
  • Learning about functional programming led me to Standard ML and its kin. Reading Harper's Standard ML book profoundly affected how I think about programming, perhaps even more than did reading SICP. Much to my surprise, I found that static typing didn't have to be the nightmare I remembered from C programming. Indeed, typeful programming turns out to be a natural match for how I like to work, with types encoding—and enforcing—a great deal of the assumptions that go into function definition. Python, with its untyped variables, doesn't lend much support to this style.
  • Concurrency appears increasingly relevant. I'd like to learn more about concurrent programming, and apply it in practice. Specifically, message passing concurrency shows great promise, interacting well with a functional programming style and providing ready control over when nondeterminism appears (see, e.g., a recent post on Lambda the Ultimate). Python isn't structured for the approach I want to explore.
Basically, there are things that I want to do that Python just gets in the way of, and the future of Python seems quite unlikely to include the changes I need.

In light of the above, the landmark Python 3 release comes across as a good time to re-assess how I use Python. As a whole, I'm left thinking that I'd be better off directing more of my time to Scala, OCaml, or Haskell. The changes in Python 3 hardly seem worth even the relatively minor effort needed to upgrade at this point, so I think I'll just stick with the installation I have for now.

Tuesday, November 18, 2008

Passive-Aggressive Python

Consider sum. It's great for adding up a list of numbers:
Python 2.5.1 (r251:54863, Feb  4 2008, 21:48:13) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = range(10); print x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> sum(x)

Strings, on the other hand, it doesn't care for:
>>> y = list("abcdefghi"); print y
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
>>> sum(y, "")
Traceback (most recent call last):
File "", line 1, in
TypeError: sum() can't sum strings [use ''.join(seq) instead]
This is about performance. Repeatedly adding up strings behaves quite a bit differently than adding up numbers. Each addition causes a new string to be created, turning the whole thing into O(n2), where n is the number of strings being concatenated, instead of the O(n) you would get by using join or when summing up numbers.

Think that seems like a fair tradeoff? Think again:
>>> u = map(lambda a: [a], x); print u
[[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]
>>> sum(u, [])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
and again:
>>> v = map(lambda a: (a,), x); print v
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]
>>> sum(v, ())
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

Both of those have the same performance issues with sum that strings have.

So much for polymorphism. The interface is muddled, being neither specialized to numbers, nor polymorphic to objects that support addition. It's all quite silly, really, because there's no reason for sum to behave like this. The worst-case time complexity is intended to be O(n), but easily becomes O(n2) as shown above—and actually is entirely unpredictable because objects can make arbitrary calculations. On the other hand, the TypeError raised for a string is an arbitrary restriction, since it could instead just internally make use of ''.join.

Apparently, sum is the way it is because that's just not how one should concatenate a string. Anything else that supports addition, go ahead, but for strings there is one—and only one—right way to do it. By fiat.

Thursday, November 13, 2008

Phrase from nearest book

Odd idea that I've seen first on Grig Gheorghiu's blog. Here are the rules:
  • Grab the nearest book.
  • Open it to page 56.
  • Find the fifth sentence.
  • Post the text of the sentence in your journal along with these instructions.
  • Dont dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.

For me, the nearest book was Practical Statistics by Russell Langley, giving this:

The average duration of numbness in these tests was 50% longer with Prilocaine than with Lignocaine, but the lack of any information about the degree of dispersion around these arithmetic means makes it impossible to assess the significance of the observed difference.

It may not be terribly poetic, but it is a quite profound idea in statistics.

Tuesday, September 16, 2008

My Results for the Sarah Palin Baby Name Generator

The Sarah Palin Baby Name Generator has been mentioned by many sites lately, so a description is probably not needed. My name would be Thump Hummer Palin.

Friday, August 1, 2008

An Anniversary, of Sorts

It has been one year since the O'Reilly MacDevCenter site has published an article on Cocoa. The articles since then are a mix of various Mac OS X tips and tricks, with a heavy emphasis on Aperture—so no actual Mac development articles.

It's been quite a while since O'Reilly has published a book on Mac development, too. There is the Missing Manual series, but it is really something distinct. Has O'Reilly given up on Mac developers?

Sunday, July 27, 2008

AppleScript Syntax: Dealing with the Depressing Reality

Given the preceding two posts, I hope it is clear that I think syntax is of too much concern in AppleScript discussions. The preoccupation with syntax can hide real problems that might actually be sensibly discussed. It seems appropriate to mention some of the aspects of AppleScript that I think could be usefully discussed. I'll limit myself here to points where I can provide some practical tips as starting points.

To begin, name resolution is pretty tricky. Rather than some sensible lexical scoping, there are complex rules. John Gruber provides an excellent description of how names are resolved, in the context of explaining a subtle bug. There is a fact that I've found useful for avoiding name conflicts: you do not have use tell blocks.

Frequently, you just need to tell applications to do a few things. However, example code often has the entire program in a single tell block. That leads to all the tricky rules for determining when a term is looked up in an application dictionary, or a Scripting Addition, or as a handler in the script. Many of these issues can be eliminated by doing something like tell application "Finder" to … instead of using the block format. In essence, this is the same idea as avoiding the use of an implicit this or self object, as seen in some object oriented languages.

Said approach also is particularly enlightening on how much of your program is actually spent dealing with interprocess communication, and how much is program logic. Frequently, not much is actually spent on dealing with IPC, but it leads to a lot of complications. Simple solution: encapsulate IPC in handlers.

Text handling is important in many scripts. AppleScript is pretty weak in its text handling. Mac OS X comes with a powerful text processing system: the Unix shell. Use do shell script for text manipulation.

AppleScript is lacking in the data structures it provides. The one I most frequently miss is an associative array, but it's not the only one. Many other languages provide a richer library of data structures, better support for defining your own, or both. You could often dispense entirely with AppleScript, except for communicating with applications, which might just be a few lines. You can communicate with applications using the osascript shell command. This lets you use just about any language you like, and still be able to gain the main benefit of AppleScript.

There must be many more.

AppleScript Syntax: Some Depressing Examples

Let's expand on the previous post by looking at some examples of how discussion of syntax poisons discussion of AppleScript. Just to be clear, the examples are taken from blogs I like and read regularly; I'm sure there are many other examples, but I haven't gone out of my way to find these.

First, let's return to Daring Fireball. John Gruber recently wrote that
AppleScript, as a programming language, is a noble but failed experiment.
To support this, he links to an earlier article, The English-Likeness Monster, in part of which he makes the far more modest claim that AppleScript's English-like syntax is a failed experiment.

Well, which is it? Languages are more than their syntax. Reading the article doesn't clarify much. Gruber details how a bug he experienced was caused by a subtle name conflict in scripting dictionary terminology. He gives a cogent description of the semantics of name resolution in AppleScript, showing how the name resolution semantics leads to the bug he experienced. However, buried within is a lengthy rant on AppleScript's syntax. It looks like, and is identified as, a digression (well, it is called an "interpolation"—Gruber is a David Foster Wallace fan, as I recall), but gives the post its title. Just what is the intent?

Additionally, there is mention of Python and JavaScript as having clearer, if more abstract, syntax than AppleScript, which helps to prevent such problems. However, if you had AppleScript's syntax but Python's name resolution, you literally could not have the same error. There is a difference at a far deeper level than the syntax. To what extent are these articles supposed to be about syntax, semantics, surrounding tools, libraries (i.e., scripting additions),…?

As a second example, let's take a look at an interesting article from Daniel Jalkut's Red Sweater Blog, called Apple's Script. Jalkut makes what is essentially an economic argument that Apple should make JavaScript the default scripting language for the Mac, keeping AppleScript as an alternative point of access to the Open Scripting Architecture. His key point is that Apple is devoting significant resources to JavaScript, and will continue to do so for strategic reasons. Further, due to wide-spread experience with JavaScript, it is in practice easier for users, despite the supposed ease of use of AppleScript (aside: I dispute that AppleScript is easy to use).

Nothing in the article depends in any way on the syntax of AppleScript, but look through the responses! Several people bring up AppleScript syntax, both as a positive and as a negative. Once the issue of syntax appears, the discussion pretty much stays there. It's a shame, because I think that Jalkut's point was an interesting one, and really does warrant some thought.

AppleScript Syntax: the Depressing Corollary

John Gruber at Daring Fireball links to William Cook's paper on the history of AppleScript. I've been meaning to write about the paper for a few weeks, based on a few recent blog posts relating to AppleScript. The history contains several facts that, in my opinion, are vital for understanding what AppleScript is today, and how we should approach it. In particular, there are some important points to be learned about AppleScript's syntax.

I first read the paper a couple of years ago, after learning about it from Lambda the Ultimate. William Cook is one of the original developers of AppleScript. He describes the development of the language for the third History of Programming Languages conference. The history of a programming language sound likely to be rather dull, but Cook's paper is far from dull. Cook sheds light on many aspects of AppleScript, making clear that the language is a mix of successes and failures. What also becomes clear is that the designers were willing to try genuinely new ideas, not all of which worked out due to practical considerations. As Cook writes,
AppleScript was developed by a small group with a short schedule, a tight budget and a big job. There was neither time nor money to fully research design choices.
It should not be surprising that some of those design choices were suboptimal or even failures.

AppleScript's natural language syntax was one of those failures. Cook writes
The experiment in designing a language that resembled natural languages (English and Japanese) was not successful. It was assume[d] that scripts should be presented in “natural language” so that average people could read and write them.… In the end the syntactic variations and flexibility did more to confuse programmers than to help them out. The main problem is that AppleScript only appears to be a natural language. In fact[, it] is an artificial language, like any other programming language.… It is easy to read AppleScript, but quite hard to write it.
(I've corrected a few typos that were in the copy of the paper I have, which was an early draft.) Besides making AppleScript accessible to average people, there were additional goals for the natural language syntax. None of them were successful.

In his conclusion, Cook writes
Many of the current problems in AppleScript can
be traced to the use of syntax based on natural language…
Sadly, many critics of AppleScript would have that be the whole story. It is not, and acting as if it were hinders understanding of real problems with AppleScript, some of which could be addressed without changing the syntax at all. In fact, I will assert that the biggest problem with AppleScript's syntax is that it prevents meaningful discussion of AppleScript!

I propose a variant of Godwin's law for AppleScript:
As an online discussion of AppleScript grows longer, the probability of the discussion devolving into a debate on the merits of AppleScript's syntax approaches one. At this point, nothing meaningful will be said, and the discussion is effectively over.
Too frequently, discussion of AppleScript actually begins on the topic of syntactic merits. This leads to the depressing corollary:
Most discussions of AppleScript consist only of a debate on the syntactic merits of the language. There is nothing to be learned from these discussions.
Or, more simply:
Most discussions of AppleScript contain nothing of value.
While harsh, I do feel these are an accurate description of most online (and, for that matter, offline) discussions of AppleScript that I've seen.

Wednesday, June 25, 2008

A Small Tip on Catch All Address and Spam Blocking in Google Apps

I've recently signed up for Google Apps, based largely on an article on Lifehacker. One interesting point was the idea of using a catch-all email address to block spam, as described in a comment or with somewhat more detail in a different article. Both explanations omit a relatively minor point that can lead to some frustration.

To summarize the method, you just set up filters in a catch-all address to identify email that matches a simple pattern, and forward it to your own account. The result is that you can give out email address "aliases" for, e.g., web registration forms, without inviting spam to your real address. If you start getting spam to one of the aliases, you can block it directly with a filter.

I set this up, and sent a test message to try it out. The message showed up in the catch-all account, but didn't get forwarded to my own account. The filter I used to set up the forward was identifying the messages correctly, but didn't send it on for some reason.

After a fair amount of web searching, I found the answer. Email for Google Apps is Gmail. Gmail does some modestly smart things to present a convenient interface; perhaps the best known is the way Gmail groups messages into "conversations." Something else it does is to hide messages that you've forwarded to yourself. This seems quite sensible in the context of conversations.

It's probably clear what was happening, at this point. I sent some test messages from my own account, which was then forwarded back to me from the catch-all account. Gmail sees a messages forwarded to myself, and doesn't show it. I'm left scratching my head in puzzlement.

After finding out about the hiding of self-forwards, I try sending a test message from a different account. It forwards without any trouble. Thus, for testing, you should always use a different account to send email than the account you want to receive it in.

Saturday, June 7, 2008

Plaxo Synchronization Oddity

I've used Plaxo for a while to synchronize my address book across a couple of Macs and a web mail account or two. This was the original intent for Plaxo, and it has served me reasonably well as a replacement for the .Mac synchronization (which didn't serve me well, and was overpriced to boot). Today, I discovered a rather strange bit of behavior in the syncronization.

Since I started using it, Plaxo has added a feature they call Pulse. Pulse seems like an interesting variant on the social networking websites, focusing on allowing you to connect up feeds from the sites you actually use and centralize the social networking aspects (e.g., from Blogger, Flickr, and I set up Pulse today.

Without Pulse, I never had much reason to take a look at the online version of my address book. With Pulse, I could see my own address plainly while setting up my profile. It was wrong.

That was quite strange. The addresses at Plaxo came from synchronizing with the Address Book data on my Mac. Those were right! However, there was a chunk of my previous address showing up on Plaxo, mixed in with my current address.

Here's what happened. I now live in Vienna. The address format for my Address Book card is thus set to Austrian. Before coming to Vienna, I lived in Portugal. The Portuguese format has a field for territorial subdivision, while the Austrian one doesn't use that. The extra stuff showing up in my current address is just the old territorial subdivision. Switching the display format for my Address Book card back to Portuguese shows the territorial subdivision field again. The old data is still there, just hidden with the Austrian formatting. It seems that Plaxo doesn't handle this situation correctly, treating all the data as still relevant (to be fair, I don't know if it is even possible for Plaxo to handle it right).

Admittedly, the described case is a bit unusual. Regardless, it may affect others. The solution, fortunately, is simple: switch the Address Book card format back to the earlier country setting, delete the hidden, outdated fields, switch back to the correct country setting, and sync with Plaxo.

Update: Or maybe Pulse doesn't let you connect a feed from Blogger. Mine doesn't seem to be working, anyway.

Thursday, May 1, 2008

Tradeoffs for Type Inference

Here's a nice essay by Chris Smith: What to Know Before Debating Type Systems. Smith gives a nice introduction to some of the ideas and terminology for programming language type systems, aimed at the perennial static vs. dynamic debate. I quite like that he dismisses the strong vs weak typing distinction as meaningless. Some terminology that I'd have like to have seen is that of untyped languages and type safe languages, since so-called dynamic typing is just that: untyped, but type safe through dynamic checking. I suppose that would have caused a lot of, e.g., Python users to stop reading, which would be a shame.

For the most part, I think Smith is right on with his presentation, although there are a few points where I'd quibble about details. However, I think that type inference warrants some further consideration. As Smith states, "Type inference is generally a big win." I fully agree with that, but there are some details worth making explicit.

First, let's distinguish type inference from what we might call type propagation or local type inference. Type inference is figuring out the types of the expressions from the expressions themselves. Type inference is seen mostly in functional programming languages such as Standard ML, Objective Caml, and Haskell. For the most part, you don't need to write down the type signature for a function, you just define it and the compiler can work out the types. Despite this, it is common to give the type signatures for functions anyway, since the types provide a great deal of information. The type information is useful for understanding how to call and compose functions, which is obviously important for functional programming. Further, declaring the type for a function can be a useful design aid, allowing a correctly typed function stub to be provided that will just raise an exception when called. In Standard ML, such a stub is defined like fun stub x = raise Fail "Not implemented". The inferred type of stub is 'a -> 'b, which is too general to be useful in any real function. By manually giving a type to stub, you can use it to define other functions and the compiler will be able to do more useful checks.

Type propagation is using known types to figure out the types of as many expressions as possible. The latter might take the form of declaring the types for functions and allowing the compiler to work out the types for all the expressions within the function bodies. If you would normally write out the types for functions anyway, you aren't really out much. You might have to write a few more type signatures, since you'd probably skip the signatures for little helper functions, but those types should be easy, anyway. There can also be gains from this approach, since you can use a more expressive type system for which type inference might not be decidable. Such an approach is taken in the local type inference for Scala. There's no guarantee that types can be inferred, but they usually can be so you don't need to write type declarations for every variable. On the other hand, by softening the guarantees for inference, types can be used to do more. It can even fit in with untyped languages for optimization purposes; my understanding is that this was a big part of how Dylan compilers were able to produce fast executables.

As a whole, having type inference provides not just benefits, but also places restrictions on the type system. Improving the type system may well make it worth giving up a general type inference algorithm in favor of weaker type propagation. The marginal cost of annotating some extra functions with types is just not a high price to pay. Turning it around, one really has the worst of both worlds with languages like C, C++, and Java: you have to declare the type of every variable, but it really doesn't provide you with any benefit.

(Via Lambda the Ultimate.)

Thursday, April 24, 2008

Code indentation in SubEthaEdit

There's a recent message on the Yahoo! group for SubEthaEdit asking about automatic formatting of code in SubEthaEdit. By building on SubEthaEditTools, it only takes about twenty minutes to connect an appropriate shell script to any given mode, calling out to, e.g., indent for the C mode. But we can do better!

It takes only a little more effort to install a general system for code reformatting into SubEthaEdit. First, we create a suitable AppleScript that calls out to a formatter:
property prettyPrinter: "${PRETTYPRINTER}"

if not documentIsAvailable() then

if (modeSetting for "PRETTYPRINTER") is missing value then
    display alert "Unable to re-indent." message "You need to define PRETTYPRINTER for the mode."

if selectionIsEmpty() then
    setDocumentText to (beautifulCode from documentText())
    extendSelection with extendingFront and extendingEnd
    setSelectionText to (beautifulCode from selectionText())
end if

on beautifulCode from sourceText
    shellTransform of sourceText for modeEnvironment() through prettyPrinter without alteringLineEndings
end beautifulCode

on seescriptsettings()
    {displayName:"Re-Indent Lines", shortDisplayName:"Re-Indent", keyboardShortcut:"^@i", toolbarIcon:"ToolbarIconRun", inDefaultToolbar:"no", toolbarTooltip:"Automatically re-indents lines", inContextMenu:"yes"}
end seescriptsettings


The formatter is stored as a PRETTYPRINTER environment variable. The include at the very end is an m4 command, not AppleScript; if you don't know what it's for, you can either just cut and paste the contents of SubEthaEditTools into the script at that point, or read about it here.

The logic of the AppleScript is pretty straightforward. I first check to make sure a document is open, then check to see if a formatter is defined for the mode through a PRETTYPRINTER environment variable. I didn't do these sorts of checks in other scripts for SubEthaEdit, because those were always for just one mode, so I could assume that there was an open document and provide a sensible default for the mode. Here, we're going to put the script into the general scripts folder ~/Library/Application Support/SubEthaEdit/Scripts, so we can't make those assumptions anymore.

At this point, I've made all the safety checks needed, so I can proceed to do the reformatting. If no text is selected, I pass all the text to the beautifulCode handler, otherwise I extend the selection to full lines and pass the new contents to beautifulCode. All that happens in beautifulCode is that the text is handed off to the shell for formatting, with a suitable environment obtained from the mode. The results of beautifulCode are then substituted for the original text.

Finally, seescriptsettings are defined. I set a keyboard shortcut of Command-Control-i; it was available, and allows a mnemonic of i for indent. I defined a toolbar item for it, with the run icon, which is pretty arbitrary; I didn't see anything better from the standard list. It can also be invoked within a document using the contextual menu. Remember, this goes into the general scripts, so appears under the scripts menu, not the Mode menu. I saved mine as PrettyPrint.scpt.

One thing remains: how do we define the PRETTYPRINTER for a mode? It's remarkably easy. The script I used for setting the LaTeX mode environment doesn't actually depend on the LaTeX mode in any way, so it can be used directly by just putting it in the general scripts folder instead of within a mode. Actually, it is probably a good idea to also check that a document is open so that a mode can be identified. With SubEthaEditTools, there's almost nothing to it:
if not documentIsAvailable() then


on seescriptsettings()
    return {displayName:"Customize Mode...", shortDisplayName:"Environment", inContextMenu:"no"}
end seescriptsettings


This script, too, goes into the general scripts folder for SubEthaEdit; I named mine OpenEnvironment.scpt. Although this second script is quite simple, it has a very profound effect, adding a preferences system for all the SubEthaEdit modes. I wonder what else it could be used for?

And that's it. Install the two scripts, define some appropriate PRETTYPRINTER values (being sure to quote the shell command appropriately!), and reformat away. For your convenience, you can download compiled scripts for the formatter and the mode environment settings.

Saturday, April 19, 2008

Very Slick Video

I'm not a big fan of Rush, but this video is quite fun, bordering on stunning.

Watch it.

(Via Pharyngula.)

Saturday, April 12, 2008

SEEing LaTeX 29: Listening to LaTeX

The SubEthaEdit mode I've developed in this series of posts is generally rather quiet. It doesn't announce successful production of a PDF from a LaTeX file, it just creates it and opens it. It also doesn't tell you when an error occurs.

You can have the errors reported by piping the output from the call to pdflatex into some command line tool. Exactly what tool to use is less clear. The natural choice for SubEthaEdit would be the see command line too, but it's not very convenient, and has some real problems. Beyond that, calling out to see repeatedly will produce a new window each time, which is just annoying.

The Python mode has a nicer behavior. It has a Check Syntax command that always writes to a window named Python Check Syntax, opening it if need be or overwriting the contents otherwise. The Lua mode has an analogous Check Syntax command with similar behavior.

Let's make a shell script that does something similar. First, I encapsulate the relevant bits of AppleScript from the Python and Lua modes into some simple handlers. Second, I embed those into a shell script using the approach recently discussed here. The resulting script I call seeless, as it is usable as something like the less pager. I'll defer the text of the script until the end, first describing the usage.

Basic usage is just to pipe in some text:
echo Hello World! | seeless
This writes the text Hello World into a SubEthaEdit window titled seeless message, opening a new window if necessary or replacing the text in an existing window. Multiple windows with the same title are a bit problematic, with no guarantee that the window you want will be the one written to. Don't do that.

The title of the window can be specified:
echo Hello World! | seeless -t"A message for you, direct from the shell"
Appropriate choice of title allows a SubEthaEdit mode to establish its own reporting window.

The window is normally brought to the front when it is written to. Call seeless as
echo Hello World! | seeless -b
to leave the window in the background. This allows, e.g., a reporting window to be kept out of the way as a tab and only checked when something seems to be wrong.

There are two different modes, insert and append, for writing to the window. For insert mode, the window is cleared before writing the text from stdin, while append mode just appends to any existing text. To set append more, use:
echo Hello World! | seeless -a
Insert mode is the default, but a flag exists for it, too:
echo Hello World! | seeless -i
Multiple flags for the insert and append mode can be given, with the final one determining the behavior.

In append mode, the text from a second call to seeless follows immediately after the text from the first. To give some visual space, specify a separator:
echo Hello World! | seeless -s"----------"
When a separator is given with the -s flag, append mode is automatically set. It is also possible to use a separator in insert mode, giving a form of header. For example,
echo Hello World! | seeless -s"$(date)" -i
shows the time when seeless was called. Another possibility would be to have a cluster of related programs writing to the same window, and using the separator to specify the program that wrote the latest text.

Let's take a look at how I put those options into use with the LaTeX mode. I set SEE_BIBTEX to:
'bibtex "${FILE%.tex}" | "$HOME/Library/bin/seeless" -t"LaTeX Messages" -s"bibtex ran at $(date)\n" -b -i &> /dev/null &'
'latexmk -C "$FILE" | "$HOME/Library/bin/seeless" -t"LaTeX Messages" -s"latexmk -C ran at $(date)\n" -b -i &> /dev/null &'
'latexmk -pdf -quiet "$FILE" | "$HOME/Library/bin/seeless" -t"LaTeX Messages" -s"latexmk ran at $(date)\n" -b -i &> /dev/null &'
Note that I have redirected the output of each call to seeless to /dev/null and made the calls asynchronous with &—SubEthaEdit hangs without doing this, requiring a force quit.

With the above settings, the LaTeX mode will cause SubEthaEdit to open up a report window titled LaTeX Messages whenever a document is typeset, bibtex is run, or the auxiliary files are cleaned up. I can put it out of the way, either as a tab or background document; because I've used the -b flag for each call, the report window will stay out of the way until I want it brought to the front. I've used a separator to show which feature was most recently used and at what time it was called. I've set insert mode with an -i flag, so I only see the latest call; by eliminating this flag, I'd have a chronological record of all the calls made (in the current editing session, anyway).

I haven't been using this very long, so there may be some bugs. However, it seems quite solid, and is definitely useful already. Download it here.

As promised above, I'll give the text of the script here, too. I've formatted the script as a shell script, to better show how the shell variables are used to adapt the behavior of the embedded AppleScript.

# Writes stdin to a SubEthaEdit document, modifying the contents if the
# document already exists. Document is selected by title, with a default
# title of 'seeless'. Title can be specified with a '-t' flag. If there
# are multiple documents with the given title, one is chosen arbitrarily.
# Writing to the document occurs in two modes, insert and append. With
# insert mode, the document is cleared before any text is written to
# it. In append mode, any existing text is maintained, with the new
# text appended at the end. By giving an '-a' flag, append mode is set.
# Giving an '-i' flag sets insert mode; insert mode is the default.
# A separator can be specified. The separator is written to the
# document before the text from stdin. Specifying a separator also sets
# append mode (but this can be turned off again if desired with an '-i'
# flag). The separator is specified with an '-s' flag; the argument
# following the flag is used as the separator.
# Giving an '-h' flag shows a usage summary. Any other flags are ignored.

#$Id:,v 1.9 2008/04/12 20:33:56 mjb Exp $

# Copyright (c) 2008, Michael J. Barber
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject
# to the following conditions:
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.

PROGRAM=$(basename $0)

    echo "Usage: $PROGRAM [-ahi] [-t title] [-s separator]"

title="$PROGRAM message"
while getopts :t:ias:bh opt
    case $opt in
    t)      title="$OPTARG"
    i)      shouldClear=true
    a)      shouldClear=false
    s)      separator="$OPTARG"
    b)      shouldBringToFront=false
    h)      usage
            exit 0
    '?')    echo "$PROGRAM: invalid option -$OPTARG" >&2
            usage >&2
            exit 1

shift $((OPTIND - 1))

/usr/bin/osascript > /dev/null <<ASCPT
    set newContents to "$(cat | sed -e 's/\\/\\\\/g' -e 's/\"/\\\"/g')"
    set seeDoc to (ensureSEEDocumentExists for "$title")
    if $shouldClear then
        replaceContents for the seeDoc by ""
    end if
    if $shouldSeparate then
        extendContents for the seeDoc by the "$separator"
    end if
    extendContents for the seeDoc by the newContents
    if $shouldBringToFront then
        tell application "SubEthaEdit" to show the seeDoc
    end if
    to ensureSEEDocumentExists for doctitle
        tell application "SubEthaEdit"
            if exists document named doctitle then
                document named doctitle
                make new document with properties {name:doctitle}
            end if
        end tell
    end ensureSEEDocumentExists
    to replaceContents for seeDoc by newContents
        tell application "SubEthaEdit"
            set the contents of seeDoc to newContents
            clear change marks of seeDoc
                set modified of seeDoc to false
            end try
        end tell
    end replaceContents
    to extendContents for seeDoc by moreContents
        tell application "SubEthaEdit"
            if "" is not equal to the contents of the last paragraph of seeDoc then
                set the contents of the last insertion point of the last paragraph of seeDoc to return
            end if
            set the contents of the last insertion point of the last paragraph of seeDoc to moreContents
            clear change marks of seeDoc
                set modified of seeDoc to false
            end try
        end tell
    end extendContents

A Bit More on AppleScript and stdin

I can get even simpler than the earlier script to pass stdin to an AppleScript embedding within a shell script. Consider this:

/usr/bin/osascript > /dev/null <<ASCPT
    set stdinText to "$(cat | sed -e 's/\\/\\\\/g' -e 's/\"/\\\"/g')"
    tell application "TextEdit"
        make new document with properties {text:stdinText}
    end tell
No need for the temporary file anymore, just use cat from within the AppleScript portion. The result of cat needs to be piped through sed in order to prevent problems with quoting, with a bunch of backslashes to get the special characters right. This may need further tweaking.

I ran into this when I tried using the earlier script for showing the results of the various scripts for the SubEthaEdit LaTeX mode. The earlier script failed when used for showing the results of bibtex. For some reason that I've not been able to work out, the AppleScript Standard Additions are no longer reached and the call to do shell script fails. The new approach works, so far at least.

Thursday, April 10, 2008

AppleScript, Shell, and stdin

AppleScript can be called from shell scripts, effectively giving access to Mac OS X applications from the underlying Unix tools. The AppleScript is embedded into the shell script as a here document, and invoked using osascript. I've given an example of this approach before.

However, there is something that I've never really been clear on. How does the AppleScript portion of the shell script read from stdin? Standard I/O is fundamental to Unix programming, so it is essential that AppleScript be able to access stdin. Searching with Google hasn't enlightened me. There seems to be no useful parallel with stdout, which works as you'd hope, with osascript writing transparently to it.

After a bit of thinking, I came up with this:

STANDIN=$(mktemp /tmp/seereport.XXXXXXXXXXXX) || exit 1
cat > $STANDIN

/usr/bin/osascript > /dev/null <<ASCPT
    set stdinText to do shell script "cat $STANDIN"
    tell application "TextEdit"
        make new document with properties {text:stdinText}
    end tell

trap 'rm -f $STANDIN' EXIT

I write stdin to a temporary file using cat, then read it back out in the AppleScript portion, again using cat. As an example, I just open a new TextEdit document with the text from stdin as its contents.

Saving this as, I can then create a new TextEdit document from the shell with:
echo hello world | ./

This works, so I've managed to get at stdin. It seems pretty roundabout though. Is there a better or recommended approach?

Update: I suppose it is worth mentioning that one could use pbcopy and pbpaste to avoid using the temporary file. However, doing so modifies the clipboard, so I prefer the approach shown.

Further, it would be possible to read the temp file using AppleScript commands, instead of calling cat with do shell script. That's too fiddly for a minimal example. Beyond that, I don't see much point to it, since I'd do any processing in the shell, just using AppleScript to pass the text to an application. I can't think of any applications where a stream approach would buy us anything.

Update: Modified example of using to actually use!

Sunday, March 23, 2008

SEEing LaTeX 28: Some Critical Notes on the 'see' Command Line Tool

I've used the see command line tool to relate a LaTeX previewer to SubEthaEdit using pdfsync. Unfortunately, I've become aware that the natural usage of see has a definite problem. Plugging in see and -g %line "%file" for the "PDFSync support" preferences in Skim doesn't work quite the way one would hope.

With that usage, which I'd consider to be the most natural, you wind up producing a new see process each time you command-click in Skim to switch over to SubEthaEdit. That process, due to shortcomings in the design of see, will hang around until you close the document in SubEthaEdit. Of course, if you're using SubEthaEdit and Skim together like that, you're in the middle of editing a LaTeX document, so you're not likely to be closing the document very promptly. Do that enough, and you could consume all the user processes allowed by Mac OS X. If you've not experience running out of user processes, let's just describe it as Not Fun.

I don't think it is likely to be a big problem, especially with the more friendly process limits in Mac OS X Leopard, but it is definitely a real problem. The problem gets compounded with each new usage of see for integrating external applications or reporting from a mode script. End result is that some caution is warranted when using see, and it should be omitted in favor of another approach, if necessary. Personally, I'm still using it.

I've requested an enhancement to see consisting of a 'quiet' mode, like seen in, e.g., grep. That would eliminate the issue, and, as a side benefit, considerably simplify using see from AppleScripts for SubEthaEdit modes. Let's hope the Coding Monkeys act on it.

Sunday, February 17, 2008

SEEing LaTeX 27: End in Sight

The activity here today, after a few weeks of silence, reflects that I've finished the documentation. I've spent some time earlier today getting everything ready. Finally, I am very pleased to announce that you can download the mode.

What comes next? That depends on you! Download it, try it out, and let me know about any problems, either by posting a comment or by sending me some email.

Update: The scripts developed in this series have been incorporated into the LaTeX mode distributed with SubEthaEdit. It should not normally be necessary to download the mode from the link given here, but I'll leave the link active in case anyone sees a need for my version.

Update 2: With SubEthaEdit 3.5, the LaTeX mode linked here is out of date, not supporting folding. You should not use it without good reason, as the mode that comes with SEE has everying this version does and more. I will leave the mode available for now, but I don't see much use for it.

SEEing LaTeX 26: Sample Environment Settings

As a starting point for customizing the LaTeX mode, here are some sample environment settings:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "">
<plist version="1.0">
    <string>'bibtex "${FILE%.tex}" | open -f -a SubEthaEdit'</string>
    <string>'latexmk -C "$FILE"'</string>
    <string>'"$SEE_MODE_RESOURCES"/bin/ "% "'</string>
    <string>'latexmk -pdf -quiet "$FILE"'</string>
    <string>'open -a Skim "$PRODUCT"'</string>
    <string>'export __CF_USER_TEXT_ENCODING=0x1F5:0:0; /Applications/ $LINE "$PRODUCT"'</string>

Monday, January 7, 2008

SEEing LaTeX 25: Putting Things in Order

As noted below, the entries in the LaTeX mode menu for SubEthaEdit need to be put into a sensible order. After some thinking, I decided that there are really three groups of scripts: scripts for interacting with the LaTeX system, scripts for simplifying typing, and a script for interacting with the LaTeX mode itself.

Ideally, we'd put the three into groups divided by a horizontal rule. Unfortunately, SEE does not (yet) allow dividing lines to be inserted into the menu. Regardless, let's organize the scripts into the three groups, and - with one exception- just alphabetize within the groups. The exception is for the "Typeset and View" menu item; it strikes me as natural to put that first in the list.

End result is a menu ordered as:

Typeset and View
Clean Up Auxiliary Files
Run BibTeX

Complete Citation
Inline Math
Insert Environment...
Un/Comment Selected Lines

Customize Mode...

Although I've broken the groups apart with spacing, they'll just run together in the menu. Perhaps a later version of SEE will allow some more structure to be added.

One final change is that I've renamed the "Mode Environment..." menu item to "Customize Mode...". The two distinct meanings of "environment" struck me as confusing, and "Insert Environment..." is simply too appropriate for LaTeX to change.

I'll add another entry to the menu as soon as possible. It will go into the third group, with a title something like "Mode Help". I still have to write the mode help, first.

Sunday, January 6, 2008

SEEing LaTeX 24: For Completeness, BibTeX

I guess the right solution is to add a menu item for running bibtex, and that's it. Since I don't use makeindex myself, I'll leave that aside, unless someone wants to contribute scripts or just examples of use. I'd guess that the scripts would be easy enough, just adapt the ones I'll present for BibTeX.

Here's the AppleScript:
checkSaveStatus without updating
set bibScript to join of {modeEnvironment(), quotedForm for "$SEE_MODE_RESOURCES/bin/", quotedForm for documentPath()} by space
do shell script bibScript

on seescriptsettings()
    return {displayName:"Run BibTeX"}
end seescriptsettings


And here's the shell script that it calls:

#$Id:,v 1.1 2008/01/06 19:09:49 mjb Exp $

export PATH

BIBTEX=${SEE_BIBTEX:-'bibtex "$(basename $FILE .tex)"'}
FILE="$(basename "$1")"
DIRNAME="$(dirname "$1")"

eval $BIBTEX

SEEing LaTeX 23: Am I Done?

Last July, I listed some desirable features for the LaTeX mode for SubEthaEdit. The mentioned features were integrating with a PDF viewer using pdfsync, enabling insertion of citation keys in bibtex format, allow typesetting by calling pdflatex from within SEE, cleaning up auxiliary files, and commenting out selected lines. I also mentioned that it might be nice, if inessential, to be able to insert environments and formatting.

I've accomplished all of that. As well, I have introduced a mechanism for customizing the shell script environment for the mode and have developed a set of AppleScript handlers useful for scripting SEE. I'm quite pleased with how all of that has worked out. Not only do I now have a LaTeX mode that covers my main needs, but I've got a solid foundation on which I - and hopefully others! - can build support scripts for other modes.

That said, the LaTeX mode is not quite finished. It's clear that I should write some documentation, and add a "Mode Help" menu item. I should also better document SubEthaEditTools. Beyond that, there are two additional tasks.

First, the scripts in the LaTeX mode menu should be put into some sort of reasonable order, rather than the haphazard order that currently is there. The items appear in order based on the names of the scripts, which can differ from the entry in the menu. What is the right order to use?

Second, I've omitted some important elements of a LaTeX system, because latexmk handles them for me. For example, I have not provided a way to run bibtex. Should additional elements be added? If so, which? bibtex? makeindex? Something else?

Saturday, January 5, 2008

SEEing LaTeX 22: Inline Math

Environments aren't the only thing common LaTeX constructs that are awkward to type. The delimiters for inline math are pretty awkward, too. Let's add those, too:
set mathText to selectionText()
set wrappedText to "\\( " & mathText & " \\)"
setSelectionText to wrappedText
if (length of mathText) equals 0 then
    set {startChar, nextChar} to selectionRange without extendingFront or extendingEnd
    setSelectionRange to (startChar + length of mathText + 3)
end if

on seescriptsettings()
    {displayName:"Inline Math", keyboardShortcut:"@~^m", inContextMenu:"yes"}
end seescriptsettings


SEEing LaTeX 21: Inserting Environments

Environments again? Yes, but this time we're going to look at environments in LaTeX, not the shell environment. Environments are a bit of a pain to type, but the repetitive structure makes them suitable to automation: we just get the name of the environment, and then have a more-or-less standard form:

Indentation of the body can be a little unclear. For example, I usually indent equation environments, but would never dream of indenting the document environment.

To add environment insertion into the LaTeX mode for SubEthaEdit, we begin by getting the environment name using display dialog:
    display dialog "Enter environment name:" with title "Insert Environment" default answer "equation"
on error number -128 -- user canceled
end try
set envName to text returned of result

The try block is to handle when the user cancels instead of entering and environment name. Somewhat arbitrarily, I set the default answer to be "equation", since my guess is that equations are probably the most common environment.

After that, there is just some fiddly work getting the formatting of the environment correct. It needs to intelligently insert newlines and tabs to keep the document readable. As well, something needs to be done with the selection text. There are two possibilities as I see it: (1) treat the selection as the name of the environment and (2) treat the selection as the body of the environment. I went with the latter. Finally, it would be nice to place the insertion point somewhere reasonable; I think it works nicely at the end of the body, especially since the insertion point is positioned to start typing immediately if the body is empty. Putting it all together, we have:
set {startSelected, nextSelected} to selectionRange without extendingFront or extendingEnd
set {startExtended, nextExtended} to selectionRange with extendingFront and extendingEnd

set prefix to selectByComparing(startSelected, startExtended, "", "\n")
set suffix to selectByComparing(nextSelected, nextExtended, "", "\n")
set indent to selectByComparing(startSelected, nextSelected, "\t", "")

set beforeInsertion to (join of {prefix, "\\begin{", envName, "}\n", indent, selectionText()} by "")
set afterInsertion to (join of {suffix, "\\end{", envName, "}\n" } by "")

setSelectionText to (beforeInsertion & afterInsertion)
setSelectionRange to startSelected + (count of beforeInsertion) - 1 + (count of suffix)
Note that I've made repeated use of a convenience function:
to selectByComparing(val1, val2, sameVal, diffVal)
    if val1 equals val2 then
    end if
end selectByComparing

The selectByComparing handler is not part of SubEthaEditTools - maybe it should be?

That's all there is to it, apart from the boilerplate:
on seescriptsettings()
    {displayName:"Insert Environment...", keyboardShortcut:"@^e", inContextMenu:"yes"}
end seescriptsettings


Update: There is an interesting possibility for adjusting the default environment. I had used "equation" as the default, but it was pretty arbitrary. Another approach would be to just repeat whichever environment was last given. We just add a property to the script that holds the default environment, use its value in making the dialog, and update the property based on the dialog result:
property defaultEnvironment: "equation"

    display dialog "Enter environment name:" with title "Insert Environment" default answer defaultEnvironment
on error number -128 -- user canceled
end try
set envName to text returned of result
set defaultEnvironment to envName

Is this actually a good idea? or will it just be annoying? Hard to say without using it, so I guess I'll try it out for a while.

Friday, January 4, 2008

SEEing LaTeX 20: Getting Citations Right

I started enhancing the LaTeX mode for SubEthaEdit by looking at citations. The original approach, based on just using the input manager supplied with BibDesk, proved to be unsatisfactory. Let's fix that now.

The general approach seems clear enough. We need to determine a search term based on the cursor position in the LaTeX document, pass that to BibDesk to get matching documents, let the user pick which documents are relevant, format the selected documents, and insert the result into the document. Using SubEthaEditTools our SEE scripting abstraction layer, it proves to be fairly straightforward.

The first step, determining the search term, is probably the trickiest. What I envision is that the user can enter a partial citation and have it finished by the script. Thus, we'll need to examine the text immediately before the insertion point and determine a partial citation key. We should not cross an open brace "{", as we don't want to include the macro. Additionally, we shouldn't cross a comma, since we might be looking at multiple citations within one macro. Let's not check the calling macro; this allows the completion to be invoked at inappropriate points, but also allows the completion to be invoked for user-defined macros.

One final issue is what we do when there is a range of text selected. Since the default behavior should be to have no text selected, we should treat a non-empty selection as meaningful. Let's take it to mean that the search should be constrained to only include the selected text. Therefore, we determine a search term based either on the selected text or all the text on the line preceding the insertion point.

Putting all that together, I came up with:
set {startChar, nextChar} to selectionRange without extendingFront or extendingEnd
if startChar equals nextChar then
    -- empty selection, try the whole line
    set selectionContents to extendedSelectionText with extendingFront without extendingEnd
    set selectionContents to selectionText()
end if
set macroArgument to the last item of the (tokens of the selectionContents between "{")
set searchTerm to the last item of the (tokens of the macroArgument between ",")

I've used a couple of the new handlers from SubEthaEditTools. These are hopefully self-explanatory, but check the implementation in case they are not.

Using the two tokens calls, we obtain a partial citation key. We pass that to BibDesk:
tell application "BibDesk"
    set citeMatches to search for searchTerm with for completion
end tell

The ungrammatical with for completion will give us a list of cite keys, not just document titles. Each completion is given as a string containing both the cite key and some document information, separated by a " % " string.

We next present the list of completions to the user, in order to narrow the list down to just the appropriate citations. To present a list, we use choose from list from the AppleScript StandardAdditions. There are three cases worth considering. First, if there is only one publication, we can select it by default, so the user just confirms whether it is correct. Second, if there are multiple possible publications, we just show the list, declining to guess. Finally, if there aren't any matches, we just inform the user with display alert; in this case, we'll set the list of publications to the empty list. If there aren't any publications returned, that means the user canceled, so we should just exit and leave the document unchanged. Putting it all together, we arrive at:
if (count of citeMatches) equals 1 then
    choose from list citeMatches with title "Citation Matches" with prompt "One matching publication:" default items citeMatches
    set pubs to result
else if (count of citeMatches) > 1
    choose from list citeMatches with title "Citation Matches" with prompt "Please select publications:" with multiple selections allowed
    set pubs to result
    display alert "No matches found for partial citation \"" & searchTerm & "\""
    set pubs to {}
end if

if (count of pubs) equals 0 then
    -- user canceled, do nothing

We now have a non-empty list of completions. We need to split those apart to get the cite keys, then join the cite keys with commas. I used awk:
set citation to shellTransform of (join of pubs by "\n") for "" through "awk -F' % ' 'NR == 1 { printf(\"%s\", $1) } NR > 1 { printf(\",%s\", $1) }'" without alteringLineEndings

Pretty ugly! Let's reformat the core of the awk program to make it clearer:
NR == 1 { printf("%s", $1) }
NR > 1 { printf(",%s",
$1) }

In this form, it's clear enough, keeping in mind that we use " % " as the field separator: we just print out the first field, corresponding to the cite key, with the first record treated specially to get the number of commas right.

Finally, we insert the formatted citation back into the document. I also move the insertion point to the end of the formatted citation, as a typing convenience. I had considered closing the brace for the citation macro, but that would make multi-citation lists more awkward, so decided against it. This is none too difficult:
setSelectionRange to {nextChar - (length of searchTerm), nextChar - 1}
setSelectionText to citation
setSelectionRange to (nextChar - (length of searchTerm) + (length of citation))

Again, I've used new handlers from the SubEthaEditTools library.

Putting it all together, and adding a seescriptsettings handler to integrate it into SEE, we get:
-- $Id: BibDeskCompletions.applescript,v 1.2 2008/01/04 18:40:16 mjb Exp mjb $

Need to figure out the search term. Treat a selection as meaning to constrain the search
term to lie within the selection, and an empty selection as meaning to get the search term
from the preceding text on the line. We don't cross an opening brace, so that the search term
comes from a call to a macro. However, we don't check to see if the macro is one of the
standard citation macros, since we do want to allow user macros.

set {startChar, nextChar} to selectionRange without extendingFront or extendingEnd
if startChar equals nextChar then
    -- empty selection, try the whole line
    set selectionContents to extendedSelectionText with extendingFront without extendingEnd
    set selectionContents to selectionText()
end if
set macroArgument to the last item of the (tokens of the selectionContents between "{")
set searchTerm to the last item of the (tokens of the macroArgument between ",")

tell application "BibDesk"
    set citeMatches to search for searchTerm with for completion
end tell

-- get list of publications, customizing user interaction based on number of matches
if (count of citeMatches) equals 1 then
    choose from list citeMatches with title "Citation Matches" with prompt "One matching publication:" default items citeMatches
    set pubs to result
else if (count of citeMatches) > 1
    choose from list citeMatches with title "Citation Matches" with prompt "Please select publications:" with multiple selections allowed
    set pubs to result
    display alert "No matches found for partial citation \"" & searchTerm & "\""
    set pubs to {}
end if

if (count of pubs) equals 0 then
    -- user canceled, do nothing

At this point, there is a non-empty list of matches, which replaces the search term. By
construction, the search term always immediately precedes the end of the selection.
Call out to the shell to format the publication list into a LaTeX citation, insert the citation,
and then move the insertion point just after the citation.

set citation to shellTransform of (join of pubs by "\n") for "" through "awk -F' % ' 'NR == 1 { printf(\"%s\", $1) } NR > 1 { printf(\",%s\", $1) }'" without alteringLineEndings
setSelectionRange to {nextChar - (length of searchTerm), nextChar - 1}
setSelectionText to citation
setSelectionRange to (nextChar - (length of searchTerm) + (length of citation))

on seescriptsettings()
    {displayName:"Complete Citation", shortDisplayName:"Citation", keyboardShortcut:"@^j",  toolbarIcon:"ToolbarBibDesk.png", inDefaultToolbar:"yes", toolbarTooltip:"Complete citation using BibDesk", inContextMenu:"yes"}
end seescriptsettings