Custom Search

Saturday, July 30, 2011

Eddie - shell scripting with Haskell

The last few years have seen an increase in interest in functional programming languages, as they are better suited to dealing with the concurrency issues found on modern multi-core desktops. Functional programs have functions that are functions in the mathematical sense: the result depends only on the input parameters, with global state not playing a role. No global state means no global state to change means not having to worry about whether or not you are running the code concurrently, which is certainly easier than having to worry about such things.

I've written about my own exploration of Clojure for just this reason. One of the recurring requests in the Clojure community was wanting to use it for the kinds of things that scripting languages - most notably Perl and Python - were used for. This was sufficiently common that someone even posted a bounty for a tool that would facilitate such usage. This was interesting enough that I designed the tool, though I never got around to writing it.

Function shell scripting isn't quite as strange as it seems. Shell filters are generally functional in nature - the output depends strictly on the input and the parameters of the function. So building filters with a functional programming language should be a natural thing. This turns out to be true.

When I eventually decided that the Java infrastructure made Clojure - lovely as the language might be - unsuitable for my needs, I decided to investigate Haskell next. While I don't see requests for shell scripting tools for Haskell, it became apparent that my design for a Clojure shell tool would work even better in Haskell, as Haskell has a much cleaner syntax for combining functions in various ways.

The result is eddie - a tool for using Haskell functions as shell filters. Eddie itself  is fairly simply - it's mostly infrastructure to compile the function argument, and arrange for the file or files to be fed to the resulting function, possibly one line at a time.

Eddie is pretty simple to use: eddie function files will run the Haskell function function over the contents of files concatenated together, assuming that it has a type of String -> String. The simplest example is to reverse a file, which is simply eddie reverse file. The project site for eddie includes a page describing the various ways you can reverse a file (with eddie commands for each of them) as well as eddie commands to simulate a number of Unix commands, so I won't go into more detail here.

If you're a Haskell programmer, you should have cabal installed, and can use that to fetch eddie from Hackage and install it. I'm not otherwise distributing eddie, except as source tarballs from the project page. The next paragraph explains why.

I think the examples show that eddie is a really flexible tool for doing odd things from the shell. The downside is that it has to drag most of the Haskell compiler along with it. The standard Haskell compiler - ghc - is a 40 megabyte file on my Unix box. Eddie weighs in at around 28 megabytes on the same box. As a result, the first time you run eddie, it winds up loading most of that into the cache, which causes a noticable delay. Further runs are then reasonably quick. Not great, but not horrible. This Haskell compiler implements strings as linked lists of characters, which slows it down.  There are libraries to avoid this, but given the size handicap it's already suffering from, fixing this just doesn't seem like a priority.

The bottom line is that I think that eddie shows that the idea of using a functional language for oddball scripting like this works well, but the implementation has sufficient problems to make using it a bit impractical. What I think such a tool needs is a much simpler language. Eddie does not need all of Haskell's type machinery, just strings, characters, integers and possibly floats. Add in the libraries for dealing with those, and a large selection of higher order functions, and that should be nearly as powerful as having all of Haskell available for the purpose of writing shell filters.