Showing posts with label ctags. Show all posts
Showing posts with label ctags. Show all posts

Saturday, February 20, 2010

Exploring Ctags: Summary

To facilitate learning about Ctags, I've written two AppleScripts and several supporting shell scripts. These scripts were not written by an expert on Ctags, so there may be some sub-optimal, or outright wrong, choices in how they were implemented. Please let me know of any bugs found or suggestions for possible improvements.

The AppleScripts use Ctags to add a couple of features to SubEthaEdit (SEE). First, there is the text completion AppleScript, which looks up a string in the tag file and identifies possible matches. SEE already does text completion, but only in open files; by using Ctags as a basis for completions, matching symbols can be found across all the files in a large project. The second AppleScript finds definitions of selected symbols, again facilitating working with a large number of files.

The interactions with the tag file are handled using shell scripts. These are written to handle tag files created by invoking Exuberant Ctags with a variety of different options, notably including either absolute or relative paths and either numeric or ex pattern references for the location in the files. The shell scripts need to be placed somewhere on the paths defined in the AppleScripts; if in doubt, ~/Library/Application Support/SubEthaEdit/bin/ will work.

A zip archive with the scripts is available for download.

The scripts are described in a series of blog posts:

  1. Exploring Ctags: Motivations

  2. Find That Tags File!

  3. Tag Matching

  4. Ctags in SubEthaEdit

  5. Ctags from SubEthaEdit to the Shell

  6. Text Completions with Ctags in SubEthaEdit

  7. Finding Definitions with Ctags in SubEthaEdit



Update: I've added another AppleScript and accompanying shell script for creating or updating a tag file for the front document in SEE. These are now in the zip archive, available at the same download link given above.

Finding Definitions with Ctags in SubEthaEdit

As with using Ctags for text completion, finding definitions for symbols can be expressed largely in terms of the shell scripts and AppleScript handlers already presented. Another handler, openTaggedSources, is needed, which will open files to the location of the selected tag or tags.

The resulting AppleScript is again quite concise:
on seescriptsettings()
{displayName:"Find Definition using Ctags", shortDisplayName:"Ctags Definition", keyboardShortcut:"@^f", inContextMenu:"yes"}
end seescriptsettings

try
requireValidDocumentForCtags()
set tagfilepath to findTagFile()
set searchTerm to determineSearchTerm with userIntervention
set taglist to (pipeMatches of searchTerm out of tagfilepath thru "")
set tagsToOpen to (pickTags from taglist with multipleSelectionsAllowed)
openTaggedSources for tagsToOpen from tagfilepath
on error errMsg number errNum
if errNum is equal to 901 then
return
else if errNum is equal to 902 then
beep
return
else
error errMsg number errNum
end if
end try
The structure directly parallels that used for the text completion script.

Let's take a look inside the openTaggedSources handler. My approach is to dump all the selected tags back to the shell, where the shell script open-tag-files will finish the job. Here's the handler:
to openTaggedSources for tags from tagfile
--pass tags to external script that opens them in SEE
set exportTagsFile to "export TAGDIR=\"$(dirname " & (quoted form of tagfile) & ")\";"
set openTagFilesPipeline to join of {"printf " & quoted form of tags, "open-tag-files RelTo=\"$TAGDIR\""} by "|"
set openTagFilesScript to join of {UnixPath, exportTagsFile, openTagFilesPipeline, "&> /dev/null &"} by space
do shell script openTagFilesScript
end openTaggedSources
I pass the location of the tag file to the script, so that either absolute or relative paths can be used in the tag files. Otherwise, it's just passing the selected tags out as stdin to open-tag-files in a straightforward way.

So let's look at open-tag-files:
#! /usr/bin/awk -f

BEGIN {
FS="\t"
}

{
# Treat relative filenames as relative to RelTo
if ($2 ~ /^\//) {
filePath = $2
} else {
filePath = RelTo "/" $2
}
# Handle both numeric and regex patterns
if ($3 ~ /^[[:digit:]]+(;\")?$/) {
match($3, /^[[:digit:]]+/)
gotoLine = "-g " substr($3, RSTART, RLENGTH)
} else {
patternPlusExtras = substr($0, index($0, $3))
numTokens = split(patternPlusExtras, token, "/")
if (length(token[1])) {
# Pattern looks invalid, so can't specify the line
gotoLine = ""
} else {
exQuery = ""
for (n=2; n<=numTokens; n++) {
exQuery = exQuery "/" token[n]
if (token[n] !~ /[^\\](\\\\)*\\$/) {
break
}
}
exQuery = exQuery "/"
command = "cat '"filePath"' | sed -e '"exQuery" q' | wc -l"
command | getline lineCount
close(command)
gotoLine = "-g "lineCount
}
}
#printf("see %s \"%s\" &\n", gotoLine, filePath)
system("altsee "gotoLine" \""filePath"\" &")
}
This is an awk script which mostly consists of handling different ways that the tag file can be structured. Since the point is to provide a platform for experimenting with Ctags, it seems premature to commit to specific choices of absolute or relative paths, numeric line references or ex patterns, extended fields from Exuberant Ctags or just vanilla Ctags lines. For what it is worth, I'm invoking Exuberant Ctags as ctags -n --fields=+a+m+n+S -R (but there may well be better choices).

At the end open-tag-files, I use altsee to open the source files. This is a replacement for the see command line tool that comes with SubEthaEdit. I find that see is a bit of a hassle for this sort of use, so gave up on it for here (if you can get open-tag-files to work cleanly with multiple selected files, I'd love to hear about how!).

All the scripts and handlers need to be assembled into a compiled AppleScript in ~/Library/Application Support/SubEthaEdit/Scripts/ with the shell scripts set to be executable and on the path defined in the AppleScripts. If you're not sure where to put the shell scripts, I'd suggest creating a ~/Library/Application Support/SubEthaEdit/bin/ directory for SubEthaEdit-related shell scripts, and putting the scripts there. A compiled script with the needed shell script support is available for download.

Text Completions with Ctags in SubEthaEdit

With the infrastructure set up in the last few posts, it is now relatively easy to add Ctags-based text completions to SubEthaEdit (SEE). We use the shell scripts and AppleScript handlers to locate the tag file, determine a search term, get a list of tags matching the search term, and put up a dialog to have the user pick a tag. The only thing we're missing is a handler to actually insert the selected tag.

Here's a handler that does the job:
to insertCompletion of baseText by completionText
-- assumes that the baseText is what was determined from the selection
set {startChar, nextChar} to selectionRange without extendingFront and extendingEnd
if the completionText does not start with the baseText then
error "Invalid completion"
end if
if length of baseText is equal to length of completionText then
-- completion is the same as the existing text, just position the insertion point
setSelectionRange to nextChar
else if startChar is equal to nextChar then
-- empty selection, search term was inferred and only the difference needs to be included
set completion to characters (1 + (length of baseText)) through (length of completionText) of completionText as text
setSelectionText to completion
setSelectionRange to nextChar + (length of completion)
else
--text selected, just replace it
setSelectionText to completionText
setSelectionRange to startChar + (length of completionText)
end if
end insertCompletion
The handler has parameters corresponding to the base text sought for in the tag file and to the selected tag. These two strings are used, along with the length of the selection in SEE, to determine exactly how much text to insert. It would have been possible to just use the SEE selection, without passing in the base text, but it would have required essentially repeating the entire process of determining the search term; I think the design could be improved here, but I can live with this for now.

Using all these handlers, the logic for the text completion script is now expressible in a compact form:
try
requireValidDocumentForCtags()
set tagfilepath to findTagFile()
set searchTerm to determineSearchTerm without userIntervention
--set taglist to (pipeMatches of searchTerm out of tagfilepath thru "awk -F\"\\t\" '{ print $1 }' | sort -u")
set taglist to (pipeMatches of searchTerm out of tagfilepath thru "cut -f1 | sort -u")
set selectedTag to (pickTags from taglist without multipleSelectionsAllowed)
insertCompletion of searchTerm by selectedTag
on error errMsg number errNum
if errNum is equal to 901 then
return
else if errNum is equal to 902 then
beep
return
else
error errMsg number errNum
end if
end try
The try block catches the errors we defined, letting any others go through for SEE to inform us about.

The last component needed is a seescriptsettings handler. I used this:
on seescriptsettings()
{displayName:"Complete using Ctags", shortDisplayName:"Ctags Completion", keyboardShortcut:"@^t", inContextMenu:"yes"}
end seescriptsettings
All this needs to be assembled into a script, which is saved as a compiled script in ~/Library/Application Support/SubEthaEdit/Scripts/. A compiled script is available for download.

Thursday, February 18, 2010

Ctags from SubEthaEdit to the Shell

In the last few posts on Ctags, I've presented shell scripts for locating a tag file and looking up a tag in it, and AppleScripts for identifying what tag file should be used and what tag to search for in it. In this post, I'll present AppleScript handlers that bridge between these two scripting systems. As in the previous post, I'll use my SubEthaEditTools to simplify the process.

Essentially, the handler will need to construct a shell command that invokes look to find a tag in the tag file. Beyond that, I'll include the option to post-process the matching lines, which I'll use for text completion. For finding the definition of a tag, no post-processing is needed, so the handler checks for an empty pipeline and handles it cleanly.

The handler is:
to pipeMatches of tag out of tagfile thru pipeline
ignoring white space
if "" is equal to pipeline then
set postProcess to ""
else
set postProcess to "| " & pipeline
end if
end ignoring
set lookupScript to (join of {UnixPath, "look ", tag, quoted form of tagfile, postProcess} by space)
try
do shell script lookupScript
on error
error "Pipeline failed to process tag matches" number 902
end try
paragraphs of the result
end pipeMatches
Note that the handler ends by taking the paragraphs of the shell script result. This converts the lines selected by look (and any post-processing) into a list of matches.

With the two use cases in mind, the user will need to pick a relevant tag or tags from the list of matches. With text completion, only one selection makes sense, but more than one might be OK for finding definitions. Here's a handler for the two cases:
to pickTags from taglist given multipleSelectionsAllowed:allowMultiple
try
if allowMultiple then
choose from list taglist with title "Matching tags" with prompt "Select tag:" default items (first item of taglist) with multiple selections allowed
join of result by "\n"
else
choose from list taglist with title "Matching tags" with prompt "Select tag:" default items (first item of taglist)
first item of the result
end if
on error
-- user canceled, do nothing
error number 901
end try
end pickTags


We're nearly done. What remains is to assemble all these handlers into AppleScripts for the two use cases, adding whatever specifics are needed for the two tasks.

Wednesday, February 17, 2010

Ctags in SubEthaEdit

We've now looked at how to locate the right tags file and match a tag against it by working in the shell. But our goal is to connect Ctags to an editor, SubEthaEdit (SEE) in this case. We thus will need to switch from the world of the shell to the world of AppleScript. In this post, I'll just focus on getting the path to the tags file and a tag for which to search from SEE.

I'll not be working directly with SubEthaEdit's AppleScript dictionary, instead using my SubEthaEditTools handlers as a basis. Should anyone be interested in connecting Ctags to another Mac OS X editor that supports AppleScript, it would probably be better to port the SubEthaEditTools handlers to work with the editor and directly use the scripts I'll present here.

As a general design strategy, I'll identify two AppleScript error numbers with expected behaviors. First, I'll use number 901 to indicate that tag processing should be abandoned. Second, I'll use number 902 to indicate that an error of known type has occurred. This lets me handle a broad class of troubles by either quietly exiting, or beeping then exiting. Any other errors will just be unhandled, causing SubEthaEdit to show a sheet with details of the error.

Additionally, I'll need to define a search path for shell tools. Rather than using a customizable environment as I've done before, I'll just define one as an AppleScript property:
property UnixPath : "export PATH=\"$HOME/Library/Application Support/SubEthaEdit/bin:/Library/Application Support/SubEthaEdit/bin:$HOME/Library/bin:/usr/local/bin:/opt/local/bin:/usr/bin:/bin:/usr/local/sbin:/opt/local/sbin:/usr/sbin:/sbin\";"


To find the tags file, I first need to make sure a document is available to use as the starting point for the search. Second, I just need to call out to the shell with an appropriate command. Encapsulating these in handlers, I define:
on requireValidDocumentForCtags()
if not documentIsAvailable() then
error "No document open" number 902
end if
checkSaveStatus without updating
end requireValidDocumentForCtags

to findTagFile()
set findTagfileScript to (join of {UnixPath, "climb", "-b \"$(dirname", quoted form of documentPath(), ")\"", "tags"} by space)
try
do shell script findTagfileScript
on error
error "Unable to locate tags file"
end try
end findTagFile


Getting the candidate tag is harder than getting the path to the tag file, mostly because it is not as well-defined of a task. Since Ctags can index lots of different languages, it won't be easy to get a solution that is right for every language. Instead, I'll define a handler that works reasonably for a lot of languages, and maintains the possibility for the user to specify the candidate precisely. This latter case is straightforward: if there is text selected in SEE, we'll search for that tag.

When no text is selected, we need to get a candidate tag in some other way. To me, it makes sense that finding symbol definitions should let the user give a term in a dialog, and that text completion should work by using the text preceding the cursor. But how much text should be used? I don't think that the longest possible tag makes sense, as that would mean, e.g., a method invocation in Python of form obj.method would use the whole thing, even though that full term is unlikely to be indexed in the tag file. Instead, it would be better to just use method as the candidate tag. A reasonable choice for many languages would then be to take the longest string of alphanumeric characters and underscores, right to left from the insertion point. Those choices lead to the handler:
to determineSearchTerm given userIntervention:shouldAsk
set {startChar, nextChar} to selectionRange without extendingFront and extendingEnd
if startChar is equal to nextChar then
-- empty selection
if shouldAsk then
try
display dialog "Enter search term:" default answer "" with title "Find Definition"
on error number -128
error "User canceled" number 901
end try
text returned of result
else
-- try the whole line
set selectionContents to extendedSelectionText with extendingFront without extendingEnd
get shellTransform of the selectionContents for "" thru "sed -E -e 's/.*([[:<:]][[:alnum:]_]+)$/\\1/'" without alteringLineEndings
-- sed returns lines that are terminated with linefeeds, so get text before the final linefeed
paragraph -2 of the result
end if
else
-- just use the selection; there is too much variation in what could be a tag to guess
selectionText()
end if
end determineSearchTerm


The handlers presented in this post are enough to get the path to the tag file and a (partial) tag to search for. Next time, I'll connect these values from SubEthaEdit to the shell scripts handling the lookup.

Sunday, February 14, 2010

Tag Matching

Our goal remains to add support for Ctags to an application. We know how to locate the relevant tags file, but what do we do with it? Fundamentally, we use the tag file to match identifiers against tags indexed by Ctags; let's make that specific, restricting ourselves for the moment to just working in the shell.

The tag file is structured as sorted lines of tab-separated records. The first field in the line is the tag, other fields identify the position of the tag in a particular source file. With this, we can check a candidate tag $TAG against the tag file $TAGFILE using look:

look "$TAG" "$TAGFILE"

Easy and fast.

To use tags to find the definition of a symbol, we'll want to hang onto all the information about each matching tag; the above use of look is all we need. For use in text completion, we'll want a longer pipeline eliminating extraneous information:

look "$TAG" "$TAGFILE" | cut -f1 | sort -u

The pipeline drops all fields but the first, the tag field, using cut and eliminates duplicates with sort -u (I suspect that uniq should work here, but look is curiously unspecific about whether it always produces its output in sorted order).

And that's it for matching tags. The file format was clearly set up with just this sort of use in mind. More details on the file format are available elsewhere.

Find that Tags File!

Our first challenge in incorporating Ctags into an editor is locating the tags file. A first attempt might be to look for a file named tags in the same directory as the document in the frontmost editor window. But this isn't quite good enough. Ctags can create a tags file by recursively descending into subdirectories, so a useful tags file might be located somewhere higher in the directory tree.

It seems like there should be a standard shell command to search upward in the directory tree, but I couldn't find it. The task isn't really that hard, so I wrote a shell script climb to do it instead of spending more time fruitlessly searching. Usage is patterned after which. To look for a tags file that recursively indexed the present directory, just do climb tags. Options are available to set where the search starts and stops.

Here's my script:
#!/bin/sh
#
# climb -- locate a file by ascending the directory tree
#
# climb [-b bottomdir] [-t topdir] filename
#
# Climb directory tree looking for a file named filename. The search
# starts by checking in the bottom directory (defaults to the current
# directory), with each parent directory checked until either the
# file is found or the top directory (defaults to root) is reached.
#


# Options allow setting the search range. Defaults are starting the
# search in the current directory and ending at root.
upTo="/"
upFrom="$PWD"

while getopts b:t: opt
do
case $opt in
b) upFrom="$OPTARG"
if ! [ -d "$upFrom" ]
then
echo $0: $upFrom: No such directory >&2
exit 2
else
# standardize the lowermost directory path
upFrom="$(cd "$upFrom" && pwd -P)"
fi
;;
t) upTo="$OPTARG"
if ! [ -d "$upTo" ]
then
echo $0: $upTo: No such directory >&2
exit 2
else
# standardize the uppermost directory path
upTo="$(cd "$upTo" && pwd -P)"
fi
;;
esac
done
shift $((OPTIND - 1))

targetFile="$1"

# To ensure termination, require that the uppermost directory is
# an ancestor of the directory where the search begins.
indx=$(awk -v d1="$upTo" -v d2="$upFrom" 'BEGIN { print index(d2, d1) }')
if ! [ $indx -eq 1 ]
then
echo $0: $upFrom is not a descendant of $upTo >&2
fi

# Check each directory for the target file, moving up the directory tree
# until either the target is found or the uppermost directory has been
# searched. Both the lowermost directory and the uppermost directory
# are checked for the file.
while true
do
if [ -f "$upFrom/$targetFile" ]
then
break
fi
if [ "X$upTo" = "X$upFrom" ] || [ -z "$upFrom" ] || [ "X$upFrom" = "X/" ]
then
exit 1
else
upFrom=$(dirname "$upFrom")
fi
done

echo "$upFrom/$targetFile"

Most of the script deals with establishing the starting and ending points of the search, which I referred to in the script as the bottommost and topmost directories, respectively. They're put into a standardized format and tested for consistency, then used to define the search. The search is simple, amounting to nothing more than successively chopping off the last element of the directory path and seeing if the target file is in the resulting directory. The search stops when the topmost directory is reached, or when root is reached, just in case.

The script is general purpose, suitable for finding more than just tags files. I have mostly just called climb from AppleScripts in SubEthaEdit, with a pretty well-behaved file name and start directory. It may well be that more complex use would reveal bugs, so use with caution.



Saturday, February 13, 2010

Exploring Ctags: Motivations

I've been vaguely aware of Ctags for years, but only in the last few months have I gotten a handle on how it would benefit me. Part of the problem is that most mentions of Ctags seem to assume you already know the benefits: the Wikipedia entry does this, as does the Exuberant Ctags site. Worse, many discussions make it seem that it is just an auxiliary for vi-family editors, so perhaps not even relevant to those who, like me, haven't seriously used a vi derivative in years.

After seeing an explanation in the context of BBEdit, I have a much better idea of what Ctags provides. Essentially, it generates an index called a tags file that allows for easier code navigation across multiple files, in particular providing text completions and navigating to the definition of functions or other symbols. Within BBEdit, tags also are used to improve syntax highlighting.

I must admit that I find some of the praise for it to be overblown, but maybe I just need to try it. Of course, I don't use BBEdit, either. In fact, no editor that I regularly use supports Ctags. Let's do something about that. I'll work in the context of SubEthaEdit (SEE), since I have a fair amount of experience with scripting it, and of Exuberant Ctags, since it supports more languages than the Ctags built into Mac OS X.

I'll add two features to SEE, text completion and finding definitions. To some extent, these are redundant, in that SEE has text completions and a function pop-up, but they don't extend across multiple files in the same way as Ctags. I won't be able to do anything with syntax highlighting, as in BBEdit, but it should still be enough to try out Ctags.

Both features will be structured as AppleScripts invoking shell scripts to do most of the work. The AppleScripts both have a similar structure, consisting of:

  1. locating the tags file

  2. determining a search term to match against the tags file

  3. identifying and processing matching tags

  4. letting the user select from the matching tags

  5. doing something with the selection


I'll break these stages out into several posts.