Custom Search

Sunday, August 22, 2010

Syntax highlighting for clojure code

Having left people hanging last week, the tool I used to format the
code for the blog hosted on blogger is:
GNU enscript.

None of the available tools would work out of the box. GNU enscript provides the most bang for the buck. I've been using it for printing text for a while now - it has all the flexibility the "in the browser" highlighters have, at least when printing postscript. While it's natural medium is postscript, it can also output html (which I use for the blog),rtf, overstrike (for printers) and ansi terminal codes.

The latter is particularly interesting. A quick addition to my shell aliases:
alias ccat="enscript -o - --language ansi --color"

allows me to just do:
$ ccat -E foo.sh

to print foo.sh in an xterm with syntax highlighting via the ansi control codes. I expect this to be very handy.

The not working out of the box problem with GNU enscript is that it doesn't have a highlighting mode for clojure. Adopting a state file from one of the LISP language variants was straightforward. Working through the code previously posted in the blog uncovered some corner cases that were easily fixed.  That clojure is a functional language did present one interesting choice: highlight all occurrences of the builtin functions, thus highlighting variables that reused those names, or only highlight them in the function slot in s-expression, thus missing them when they were passed as values to other functions? I eventually chose the latter, as that's what emacs does.

Once you've installed GNU Enscript, you'll want to get the clojure state file from the bitbucket repo for the blog, and then install it in the share/enscript/hl directory where enscript stores the language state files. After doing that, you can use -Eclojure to get clojure highlighting.

Thursday, August 19, 2010

On overengineering software

Overengineering in a software project results in code that is more
complicated than the problem at hand calls for. It's usually the
result of the engeineer being fascinated by the technology being used
for the problem, and hence applying it where it isn't appropriate, or
simply in excess.

I ran into a common - and very popular - example of this while moving
this blog from Wordpress to Blogger. Wordpress provides a tool for
displaying code from various programming languages with proper
highlighting. Blogger has no such facility. While this type of thing
is important - possibly critical for a blog on programming - it wasn't
such a big deal for me. I wasn't very happy with the Wordpress tool,
and got less happy as I looked into it.

Seems there are three broad classes of tools for this: stand-alone
facilities for the desktop, tools for exporting HTML from editors that
do the highlighting, and javascript that does the highlighting in the
client. The first two are a problem for blogging, as web-based
applications typically don't integrate well with the desktop, so you
wind up having to cut and paste the text, instead of just inserting
a file into the blog text.

The JavaScript one is slightly more convenient for the author - they
can cut and paste directly from the source, and then wrap tags around
to indicate what should be highlighted. They may be required to fix
characters that are magic to html, or not, depending on the blog and
software. It might also puts the code in scrollable are, number the
lines for reference, and add widgets to let you copy the text out
easily - all good things.

Unfortunately, these widgets aren't so convenient for the reader when
compared to the other alternatives. If the reader is using a
lightweight or downrev browser with no JavaScript support, is paranoid
enough to use something like NoScript or otherwise disable JavaScript
for untrusted domain, or is simply behind a corporate firewall that
disallows anything it thinks is executable code to reach the user,
then they get no highlighting. Since the highlighting is done in the
client at load time, it will slow down page loading. This effect
scales - the more code you have, the longer it takes to load the
page. Chances are the user won't notice this - they'll be reading text
until after the code is highlighted. However, as someone who develops
distributed systems, running this code every time the page is loaded
on every client system when it could be done once before the text is
uploaded just strikes me as wrong. Especially when that same
JavaScript could be used to do the job once in the authors browser
before the blog is posted instead of every readers browser every time
they load it.

I can see why the author did it this way - this is really cool
technology, and you get a really spiffy result. Doing the same thing
on the desktop just isn't as cool. Still, if the job is to deliver
highlighted text to the reader, the desktop versions do the job
better. The authors got caught up in the technology, and used it where
it wasn't quite appropriate.