Friday, December 21, 2007

SEEing LaTeX 15: Comments on Comments

I really thought that getting comments working in the SubEthaEdit LaTeX mode would be easy. It wasn't.

My expectation was that I could just adapt the Un/Comment Selected Lines script from the Objective C mode. However, I didn't really find the script to be satisfactory. Aesthetically, it's unappealing. Instead of using all the selected lines to determine whether to comment or uncomment the lines, just the first line is used. You can see the comments being applied one line at a time, because the AppleScript implementation loops over the lines, with each iteration in the loop sending its own, slow AppleEvent to SEE.

More importantly, the script functions in a manner that I consider to simply be incorrect. If lines are already commented, the objc mode script leaves them unaltered. Now consider working with a block of lines, only some of which are commented. If you invoke the Un/Comment Selected Lines script twice, the first invocation will comment, and the second will uncomment. Since the first invocation leaves the originally commented lines unchanged, there is no distinction between them and the originally uncommented lines. The second invocation thus removes the comments from all the lines, failing to restore the original state, changing the meaning of the program. In Objective C, that probably leads to an error at compile time. In LaTeX, you've just altered your document in a way that is quite likely to still be valid. It's also quite likely to be wrong, wrong, wrong!

In short, I decided I needed to rewrite the Un/Comment Selected Lines script to be (1) more aesthetically pleasing and (2) its own inverse. Working directly in AppleScript was not so easy. To eliminate the loop that causes the aesthetic issues, you really need to use a where clause that addresses all the lines at once. I couldn't get that to work. Maybe someone who's better with AppleScript could. But, really, why put in the effort, when the shell is considerably more powerful for text manipulation?

Once again, I'd figured it would be easy. Just a little grep to detect whether I should comment or uncomment and a little sed to make the actual change. After actually trying it, I quickly realized that there were some real challenges in trying to dynamically build up the regular expressions needed for grep and sed. As I often do when scripting, I found awk to be the solution to my problem, producing

# Toggle comment status for text lines. Text lines are read from
# stdin and un/commented text lines are written to stdout.
# Comments are defined by the line starting with a text string given
# by the first argument. The lines will be uncommented if all lines
# are commented, and commented if any or all of the the lines are
# uncommented. The script is its own inverse, i.e., piping the text
# through the script twice writes the original text to stdout.

#$Id:,v 1.3 2007/12/18 21:40:47 mjb Exp $

tmp=$(mktemp /tmp/comments.XXXXXXXXXXXXXXXXXXXX)

clen=$(printf "%s" "$1" | wc -c)

tee "$tmp" |
    awk -v clen="$clen" '{ print substr($0, 1, clen) }' |
        grep -F -q -v "$1"

if (($?))
    # uncomment
    cat "$tmp" | awk -v lnbeg=$((clen+1)) '{ print substr($0, lnbeg) }'
    # comment
    cat "$tmp" | awk -v comment="$1" '{ print comment $0 }'

trap "rm -f $tmp; exit" EXIT HUP INT TERM

The script reads lines from stdin, writing either commented or uncommented lines to stdout. The comment string is given by the first argument to the script.

In, I first make a temporary file, since I'll need to go through the lines twice; using tee, I can make a copy of the lines in the temp file. I determine the length of the comment string. I then use awk to cut away just the first few characters of each line, comparing them to the comment string with grep -F (-F for fixed strings, no regular expressions!). That determines whether or not all lines start with the comment string.

When all the lines start with the comment string, I uncomment by chopping off the initial characters using awk. Otherwise, I comment by printing both the comment string and the lines, again using awk. Again, note that I've avoided using regular expressions.

The last line ensures that the temp file is removed. There's not much else to say about it.

With the script available, it now remains to get the necessary text from the LaTeX document and send it to the script. I follow the basic strategy shown on the Coding Monkeys website, consisting of copying the text to the clipboard and using pbpaste to pipe it into a desired shell script. I wrapped all that up into an AppleScript handler:
on shellTransform of inText for envString through pipeline given alteringLineEndings:altEnds
    set shellscript to envString & " export __CF_USER_TEXT_ENCODING=0x1F5:0x8000100:0x8000100; pbpaste | " & pipeline
    set the oldClipboard to the clipboard
    set the clipboard to the inText
        set shellresponse to do shell script shellscript altering line endings altEnds
    on error errMsg number errNum from badObject
        set the clipboard to the oldClipboard
        error errMsg number errNum from badObject
    end try
    set the clipboard to the oldClipboard
end shellTransform

Note the use of the try block to restore the clipboard in case of error, followed by another statement restoring the clipboard. It should then be that the clipboard is always restored to its original state. This construction is a little awkward, so the handler is a natural abstraction to hide the mess. The environment variable __CF_USER_TEXT_ENCODING follows the Coding Monkeys site example exactly - it doesn't seem to hurt when I omit it, but I'll just trust that it is correct.

What remains is to specify exactly what the text is. At first glance, it seems like we should just take the selected text. This has a serious drawback: you need to completely select all the lines you're interested in, or you'll add comments to the middle of a line. As an important special case, you'd be unable to just press the keyboard shortcut to comment out the current line with no selected text. So, I decided to extend the selection to complete the first and last lines of the selection; the special case is handled cleanly in this way, too. I defined another handler to manage the selection:
to completeSelectedLines()
    tell the front document of application "SubEthaEdit"
        set {startChar, nextChar} to {startCharacterIndex of paragraph (startLineNumber of selection), nextCharacterIndex of paragraph (endLineNumber of selection)}
        set selection to {startChar, nextChar - 1}
    end tell
end completeSelectedLines

The handler is a little complicated to understand because of how it is written. An equivalent form is:
to completeSelectedLines2()
    tell the front document of application "SubEthaEdit"
        set startLineNum to startLineNumber of the selection
        set endLineNum to endLineNumber of the selection
        set startChar to startCharacterIndex of paragraph startLineNum
        set nextChar to nextCharacterIndex of paragraph endLineNum
        set selection to {startChar, nextChar - 1}
    end tell
end completeSelectedLines2

The second form is a lot slower, though, because it sends individual AppleEvents to handle things that are done in just one with the first set statement of the first form.

With that, all the infrastructure needed to put together the desired Un/Comment Selected Lines script are at hand. I'll do that in the next installment.

Update: A better shell script for handling the comments is available.

No comments: