Custom Search

Wednesday, December 29, 2010

The whats and whys of rooting and jailbreaking smart phones

I've been asked about this a couple of times, so decided to write it up here.


Android runs on Linux - a Unix-like operating system - including the all-powerful user named "root".  To break into a Unix system means to gain access to that account. Since Android phones aren't normally shipped with that access, you have to violate the security in some way to get it, hence you've "rooted" the phone. Doing so lets you re-arrange the system in ways that someone along the way probably didn't want you to. A second level of extra access to Android phones is unfortunately known as "unlocking the boot loader" - the point of that being to install an Android build other than those approved of by whoever gave you the phone. For some Android phones, this is no big deal - you can already install pretty much any build. Developer phones even come with a command to let you change the boot loader. The general reason for doing this is to install a system that has the changes you'd like to make as root already built into it, so you don't have to do them all by hand.


iPhones run iOS, which is also based on a number of different Unix-like systems. Initially, they just ran everything as root - you had root access by default. However, Apple also employed a form of Digital Rights Management (DRM) - in spite of the fact that they know DRM has never worked - to keep users from running programs that Apple deemed "undesirable". To circumvent that, you have to install a new version of the system - also circumventing the security in the boot loader. However, iPhone's can also be "unlocked" (hence my use of the word "unfortunate"), which means allowing the phone to be used on other carriers. This usage predates smartphones, since cell phone companies like to "lock" their discounted phones so they can only be used on their service. Unlocking an iPhone allows it to be used on any compatible carrier. This generally isn't an issue for Android, as most carriers offer reasonable-quality android phones.


Ok, why do you want to do these things? With an iPhone, it's usually go get some application that Apple doesn't think anyone would want or should have. I.e. - I initially jailbroke my iPhone to install an ebook reader, since at the time Apple believed nobody would read, and that we'd be happy with web apps - 0 for 2 on those - and a web app is about the worst possible way to read an ebook. Later, when the applications became real, I wanted to multitask some of them, but Apple - in retrospect correctly, as the hardware and software of the time couldn't make this as smooth an experience as Apple wanted - didn't think users should be allowed to do that. Even after that was fixed, I still wanted to sync my iPhone to two different macs: the one connected to the TV for videos, and my laptop for music and contact information. Other reasons might include wanting a soft keyboard that apple didn't provide, or a better home page handler, and so on.


For Android, those reasons don't really apply - there have always been applications, and there have seldom been restrictions on legitimate applications, or for adding any other piece of software that might plug into the system: alternative home pages handlers, keyboards, etc. No, the reasons for rooting an Android phone are to get around restrictions that the provider may have put in place - like only being able to install apps from the Android market. But more commonly, it's a performance issue. The first generation of Android phones didn't allow applications to be stored on the SD card, but with root privileges you could do that anyway, thus saving on memory. Custom kernel builds more tightly coupled to the hardware provided better performance. They might even include updates the provider wasn't willing to support on the phone. For Android, you should root your phone - and probably install a custom build of Android - only if your phone's performance is a problem.

Tuesday, November 23, 2010

Evaluating Clojure

The point of the X10 Controller web site project was to put together a simple but not trivial project in Clojure to evaluate the language. I think it served that purpose well.

Clojure is a modern LISP. It reuses the best parts of LISP - simple syntax, functional programming, and a powerful macro facility - and adds modern data structures, high level concurrency tools and concepts like abstract data types. And then it all runs on top of the JVM, making it relatively easy to generate the various Java blobs that deploy as applications, to web servers, or out on the cloud. The parts mostly fit together well, but it's still a work in progress. For instance, the protocol, records and types facilities are new in 1.2, and flagged as "experimental".

Overall, I like the language. It still retains enough of LISPs flavor that my time spent with similar languages makes me comfortable there. Better still, the modern data structures are integrated into the language better than I remember them being in older LISPs, with tools for reaching through multiple levels of structure much like one would use the c[ad]*r functions on lists. The end result is that I can create short, conscise, but readable code for navigating relatively complex structures. The simple
(get-in *devices* [display what]) which finds a device or returns nil if it doesn't exist would take several lines in most language, as you'd either have to check the first index to see if it exists, or deal with the exception thrown if it doesn't.

While I wasn't really dealing with concurrency issues here, the concurrency features - which are part of what drew me to the language in the first place - included a facility that exactly met my needs, even though they aren't what it was intended for. That kind of serendipity is a good sign, and I'm looking forward to exploring those features in depth later.

The Java integration, on the other hand, is a two headed beast. Some consider it an abandonment of what makes LISP LISP. On the other hand, it's also part of what attracted me to Clojure. Classic LISP systems did not integrate well into the environment - they wanted to be the environment. There have been a number of attempts to deal with this issue since I last worked with LISP, but none that really captured my attention.

Clojure solved that problem, mostly very neatly. I had little trouble using the available Java libraries for dealing with either serial ports - which would have required wrapping non-Java code at some point in any case - or the MC17A library. Creating a deployable executable was a bit more work, but more on that later. Finally, another bit of serendipity: the protocol and record structures - designed to make it easy to generate objects that interface with Java code - also turned out to be exactly right for both X10 controllers and X10 devices - at least as I needed them. They felt much lighter than the Python versions of these things I had been contemplating, though in all fairness they are also less capable. On the other hand, that they are designed for Java interop instead of being LISP functions means I had to wrap them in functions to be able to use them the way I wanted.

Now for the bad side. In spite of running on Unix systems, the Java environment does not share the philosophy of Unix systems. In particular, I'm using to simple things being relatively simple. I.e. - I can build an executable and deploy it without needing some kind of file describing all the bits and a tool that uses that to pull them together, though that facility is there if I really want it. In the Java world - and hence in the clojure world as well - I don't seem to have a choice. The Java environment, in particular, seems to follow the language in being very verbose for these things, taking a lot of what's essentially boilerplate and repeated information to get things done. Where the tools have been reimplemented in Clojure, things aren't so bad. But in this area, as elsewhere, things are still a work in progress.

The environment makes Clojure a poor choice for projects much smaller than the X10 controller - the capabilities scattered across four files and another four support files would probably have wound up in three files with no support files in Python. On the other hand, the expressiveness and power of the language makes it quite pleasant to use in projects large enough that the infrastructure is a convenience instead of a pain, or where deploying a Java blob of some kind is part of the goal.

I look forward to getting to know Clojure better in the future.

Tuesday, November 2, 2010

An X10 Controller in Clojure

Having finally dealt with the disasters, personal matters, and found the pieces I needed, I've finished the project I started to study clojure. Or at least enough of it to be useful.

The goal is to have a web page to let me control the various X10 PLC modules scattered around the house: mostly it's lights, but the thermostat, a grill, a printer and the garage door opener are all controllable this way. This is an ideal type of project for learning about a language and it's environment. It's about as simple as a project can get without being trivial. A small collection of data, a little bit of state information, and a web page to display that information - or parts of it - along with controls to change that state - or parts of it. It can reasonably be done as one page, but has natural representations on multiple pages as well.

For those not familiar with them, X10 devices are used to control power to arbitrary bits of electronics. They and the controllers the user uses plug into the wall, and communicate with each other over the power line (hence Power Line Control). Later tools include RF receivers as dual-purpose control & power units, that RF remotes transmit signals to which they echo over the power line, plus being able to control one plug themselves. The modules have addresses consisting of a house code - or code - between A and P and a unit number between 1 and 16.

The computer accesses all of these through a dongle hanging off the serial line. Rather, it can transmit RF signals to the receivers. There are quite a few nice packages for doing all kinds of fancy things with these devices, both open-source and commercial, for pretty much any platform, that include the functionality I wanted. They are all overkill for what I want, and none have - or rather, had when I started down this path - web pages formatted for use on a smart phone.

In theory, access to serial line devices from Java is via Suns comm module. In practice, Sun only releases jars for Solaris and Windows. The solution is to use the rxtx module, an LGPL'ed rewrite of the comm module. That was available from the package system for my target platform - all well and good.

The X10 serial line protocol is - well, ugly doesn't really do it justice. Fortunately, Michel Dalal has already written a CM17A driver that worked with my hardware and the Sun comm module. While the comm module can supposedly be configured to use the rxtx module on non-Solaris Unix platforms, that never worked right. Possibly a version problem, or an issue with the build for my platform. Changing the type of the serial comm object CM17A used and recompiling it solved that - I could use the rxtx objects directly, and pass those to CM17A.

X10 Controllers protocols



The bottom level abstraction is a wrapper around the CM17A class. The file src/x10/controllers.clj includes it. At this point, controllers have a simple protocol: open them to use, close them because I'm done, and set an x10 device's state:

(defprotocol Controller
"X10 Controller Modules"
(open [this] "Whatever is needed to open the controller and ports")
(close [this] "Close & free the port for possible reuse.")
(set-device [this code unit state]
"Set the device at address code/unit to state"))

Since I only have one device, I only have one device type, hence only one implementation of this protocol, the CM17A-controller:

(deftype CM17A-controller [^CM17A controller ^String port delay]
Controller
(open [this]
(when-not (.getSerialPort controller)
(.setSerialPort controller
(.open (CommPortIdentifier/getPortIdentifier port)
"Mike's X10 Controller" 2000))
(Thread/sleep (Integer/parseInt delay)))
this)
(close [this] (.close (.getSerialPort controller)) this)
(set-device [this code unit new-state]
(.setState controller (first code) (Integer/parseInt unit) new-state)
(Thread/sleep (Integer/parseInt delay))
this))

I'm eventually going to build a map from human-legible device names (like "Office Light") to these controllers. The tricky part is that I can't open the serial port for the controller when I build the map, as that will cause building the war for deployment to open the port, and fail (because the port is in use by the running application) or hang (not sure about that one - seems to be an issue with the -war plugin for lein). So open is a distinct action, and it checks to make sure the device isn't already open before it does anything so I can safely invoke it on every request.

To go from the config information - which will eventually be in a database of some sort - to actual code, I use a map from the name of the controller type to a bit of code that creates both a java CM17A object a clojure CM17A-controller object:

(def *controller-type-map*
{"CM17A" (fn [port delay] (CM17A-controller. (new CM17A) port delay))})

This is wrapped up as a function, as I want to create a new controller object for each port one is attached to. To actually create it, I map it over the *controllers* list from the config information:

(def *controller-map*
(walk (fn [[name module port delay]]
{name (agent ((*controller-type-map* module) port delay))})
#(apply merge %)
*controllers*))

Since each x10 controller is a bit of hardware - something you can kick - that is usually only serially reusable, I wrap the actual instances in a clojure agent. This will serialize access to the controller. The agent interface expects the processes that it applies to it's data to create new data, which they then return. Since I'm using mutable Java objects, I don't create new data - but I still need to return the old one, which is why each method in the type ends with this.

In the end, *controller-map* and the open, close and set-device methods are the only things a client of this module needs.

X10 Devices protocol



Since I wanted to control things other than single devices - groups of devices to start with, and eventually some kind of macro facility, and possibly variables - this seemed like another place to use the Clojure 1.2 protocol/record system. Ok, maybe it's overkill - we really only have one method for the protocol: set-state, which sets the state of the device, returning a string describing what it did. The multimethod support might work as well, but using a protocol eliminates the need to write dispatching code. The protocol is:

(defprotocol X10
"Protocol to set the state of x10 objects."
(set-state [this newstate] "Set object state. Returns a string to show user."))

As noted, it set-state returns a string to tell the user what it did (or, given the nature of X10 PLC kit, what it tried to do). The actual implementation for an X10 module is straightforward:

(defrecord Module [controller name code unit state]
X10
(set-state [this new-state]
(do
(send controller open)
(reset! state new-state)
(send controller set-device code unit new-state)
(str "Turned " name (if new-state " on." " off.")))))

The fields of the module record are the name to display for this module, the controller it's attached to, the code and unit of it's address, the state - an atom holding the state we last set it to - and a delay that is a the number of milliseconds needed to for this controller to change a devices state. This is where the open method gets used to insure that the controller is open and ready to accept commands.

I also wanted to make sure that things other than hard devices would work. The simplest of those is a Group, which is just a collection of devices that toggle on and off with a single command. The implementation for those is trivial:

(defrecord Group [name devices state]
X10
(set-state [this new-state] (join " " (for [device devices]
(set-state device new-state)))))

Finally, I need a map from device and group names to actual devices and groups that the presentation layer can use. That is the *devices* map:

(def *devices*
(let [name-map (walk (fn [[name controller code unit]]
{name (Module. (*controller-map* controller) name
code unit (atom nil))})
#(apply merge %)
*names*)]
{"Devices" name-map
"Groups" (walk (fn [[name group]]
{name (Group. name (map name-map group) (atom nil))})
#(apply merge %)
*groups*)}))

This builds the device map first, so that it can be used to translate the device names in the group configuration into references to the actual devices.

The web page



So now I need to connect that to a web server. The standard for doing that with Clojure is the Ring module. This calls a handler with a single argument, a Clojure map containging the request information, and expects a map containing the result - including a page to be displayed - to be returned. My handler is defined like so:

(defn my-handler [req]
(let [{:strs [command display what] :or {display "Devices"}} (:query-params req)
result (if-let [device (get-in *devices* [display what])]
(do-command command device)
(if what (str "Unknown device " what " in " display ".")))]
{:status 200
:headers {"Content-type" "text/html" "Cache-Control" "no-cache"}
:body (write-page *devices* display (:uri req) result)}))

(def handler (wrap-params my-handler))

The first part is the handler proper. The second part wraps it with a Ring tool to parse the query parameters. The handler proper destructures those query parameters to get a command, the device type (as display) and what device in particular to use. If that display and device type is found in *devices*, the do-command function will take care of running the command on the device. Otherwise,
the result of the command is going to be an unknown device message.

do-command is a trivial function to look up the command in a case statement and execute the code associated with it, or return an unknown command message:

(defn do-command [command device]
(case command
"on" (set-state device true)
"off" (set-state device false)
(str "Unknown command " command ".")))

That leaves the actual body of the web page, which is generated by the write-page function, using the bitchen html template engine that I wrote to start this project:

(defn write-page [devices display uri response]
(html (head (title "X10 Controller")
(meta_ {:name "viewport" :content "width=160"}))
(body
(h1 "X10 Controller")
(p (join " * " (cons
(a {:href (format "%shelp.html" uri)} "Help")
(for [new-display (keys devices)]
(if (= display new-display)
(span {:style "font-weight: bold"} display)
(a {:href (format "%s?display=%s" uri new-display)}
new-display))))))
(apply table
(for [device (map second (devices display))]
(let [fmt (format "%s?command=%%s&display=%s&what=%s"
uri display (:name device))]
(tr (td (a {:href (format fmt "on")} "on"))
(td {:align "center" :style (case @(:state device)
true "color:green"
false "color:red"
"color:inherit")}
(:name device))
(td (a {:href (format fmt "off")} "off"))))))
(p response))))

This is fairly straightforward: Other than the headers, a paragraph that lists the top-level maps in *devices* - "Devices" and "Groups" - along with a link to a "Help" page. This is followed by a table with one line for each module in the chosen map, with "on" and "off" buttons, and the module name. The module text's color is chosen depending on the state, either green (on), red (off) or inherited if it hasn't been set. Then the last paragraph, with the results of running the command just requested.

Yup, this is a plain, web-1.0ish interface. Then again, that was the goal. It's plain because my designs get uglier the further I get from plain; I'd have to get help to make it pretty. It could be extended to make the "on" and "off" buttons use AJAX to run the command, change the color of the module name and update the result paragraph, but for an application that's never going to have more than a half-dozen users all on the local LAN, that's overkill.

That completes the project as originally conceived. The code is now available in the bitbucket repo, along with an update to clojure.st to support clojure 1.2.

Next, the post-mortem.

Thursday, October 7, 2010

How to end the Reply-To wars

As a professional programmer, electronic mail lists are vital to me - they are where I get support for most of the tools I use, look for work, look for people to hire, etc. One of the ongoing annoyances on mail lists is random outbreaks of the Reply-To war.

The origins of these wars goes all the way back to the beginning of email. The creators of email were clever folks, and gave us numerous headers to denote different roles - From for who the mail is from, Sender for who sent it, To for who it was written to, Cc for who else should be sent a copy, and Reply-To for who the replies should go to if not the author. The problem is then "who do you reply to?" You commonly want to either reply to the author, or to everyone. So mail readers added a reply-all command (I'd say buttons, but mice weren't very common back then, and most users were using character devices) that replied to the To and Cc addresses as well as the From or Reply-To address.

Mail lists - another clever invention - are almost as old as electronic mail. They leverage the electronic nature of email to make it possible to send a message to many people without having to create many physical copies of the mail. However, the standard reply/reply-all commands are insufficient, because there are three reasonable defaults for a reply to mail from a list:

1) The default reply on a discussion list that only members can post to should only go to the list.
2) The default reply on a want ads style list should only go to the author.
3) The default reply on a help list should go to both, as you want to make getting help easy.

With only two reply commands, you clearly can't have all three replies easily available. The normal behavior doesn't have a command to reply to the list. One way to fix this is to use (or abuse, depending) the Reply-To header by setting it the list address on mail to the list. This means a mailer with two reply commands no longer has a way to reply to the author, as that option has been co-opted to reply to the list.

This is the root of the Reply-To war: should a mail list set the Reply-To address on messages or not? Some people believe that should never be the case, and argue that doing this violates the RFCs and breaks software and user expectations. Others believe that this should be the case most of the time, and argue that not doing so is impractical and breaks software and user expectations. Most people don't care - they either view this as just another annoyance of dealing with computers, or have gotten so tired of the discussion that they don't want to hear it again. But a strident few on both sides will bring it up on lists that don't act they way they want, generating replies and flames from the opposition.

Oddly enough, the complaints boil down to the same no matter which side you're on: "The reply commands on my mailer don't behave the way I expect them to for this list." Both sides, in fact, regularly tell the other side to get a better mailer to fix their problems. While this is the right spirit, it applies to both sides once you realize what better really means. The solution has been around since 1998, when RFC 2369 was published, which provided guidelines for adding headers to messages to make list functionality easy to access. Most mail list software now adds a List-Post header that tells you how to post to the list, which contains the address that would be put in the Reply-To header should the list abuse (or use) that. Most mail readers still don't have a way to use the List-Post header, or don't have that enabled by default, and still just offer reply/reply-all commands with the old behaviors, so the old arguments persist.

If you're going to get a better mailer, you should get one that has the functionality you actually need: A reply-list command, that uses the List-Post header; a reply-author command that replies to the From header, ignoring Reply-To in case some mail list set it. And then the old reply-all command, so you actually have all three list replies. Personally, I also have the original reply command, which doesn't ignore Reply-To, for those rare messages that use Reply-To for what it was originally intended for.

So if you're tired of these wars - or tired of inconsistent mail list behavior - check your mailer to see if it supports List-Post, and enable it you need to. If it does, post a comment here telling us what it is. If it doesn't, submit an enhancement request to get that added. And then bug your mail lists to support List-Post if they don't.

Mailers I know support List-Post: Claws-mail, which provides the four commands named above. KMail, which points the reply command at List-Post if it is present, and provides reply-author with the reply behavior.

Sunday, September 5, 2010

A Monkey in Clojure

One of the things contributing to the hiatus was unpacking and cataloging my office library. During that process, I found a book I couldn't remember buying, much less reading: The Practice of Programming by Kernighan and Pick. It naturally went to the top of my reading list.

Chapter three in TPoP presents the same algorithm in five different languages: C, C++, Java, awk and Perl. The algorithm in question is what's commonly known as a Markov Chain, this one dealing with words. Markov Chains generate output that resembles an input sequence of by recording the suffixes that follows fixed-length prefixes in the input. If the sequence is composed of words, you get text that resembles the input more closely as the prefix length increases. If the sequences are composed of letters, you get text that goes from gibberish to ever-longer words from the input as the prefix length increases These were a popular project when I was in college, though we called them "monkeys", because, according to a then-popular meme, a million character monkeys with length zero prefixes would eventually output all of Shakespeare. I'm using the more colorful name.

On being reminded of the problem, I was instantly struck by how well clojure was suited for this problem: input a sequence to output a map that mapped vectors of words (the prefix) to a collection of words (the possible suffixes), then use that map to generate a sequence of output words? Should be a slam-dunk.

With a major caveat - there are many different ways to tweak this algorithm. Besides the obvious characters or words or other objects, you can fix the prefix length or make it variable, and there are various ways to start the sequence and to cause the generation phase to terminate. If you're doing words, then there are a variety of ways of parsing the input, dealing with punctuation, etc. All of those can make things more complicated. Kernighan and Pike chose a simple model: any printing character belong to the word, all punctuation included. They also used sentinel values both to start the sequence of words and as an indicator that the generated text should stop. As they point out, this is in general a good practice, because it makes the resulting programs simpler.

So, the first step was to see what was available on the web. While google turned up a couple of markov chain implementations, they didn't sit quite right with me: they were closer in length to the C++ version than the awk and Perl versions. I felt that it should be possible to do better than that - that a problem this well suited to clojure should be expressible in about the same number of lines of code as it is in Perl, awk or the Python version I in the Python CookBook. This is not to say the other algorithms I found were bad - they were solving slightly different - and more intricate - versions of the problem. At least one could be used for either character or word monkeys.

First step - generate a sequence of words from an input file. Since we don't want to have to keep all of the words in memory, we're going to make it a lazy sequence. This being my first use of lazy sequences in clojure, I'm not sure I actually managed that goal. Anyway, read-words returns a lazy sequence of the words from standard input. It goes to a little bit of trouble - trimming whitespace from the beginning of the input and then filtering empty strings - to deal with odd formatting on the input.

;;;; A variable-order word monkey (markov chain of words.

(ns monkey (:use [clojure.contrib.str-utils2 :only (split trim)]))

(defn read-words []
(letfn [(read-unfiltered []
(if-let [line (read-line)]
(lazy-cat (split (trim line) #"[\s]+")
(read-unfiltered))))]
(filter seq (read-unfiltered))))

The sub-function read-unfiltered does most of the work, using lazy-cat to create a lazy sequence from the sequence resulting from splitting each line on whitespace after trimming any leading and trailing whitespace. The main body just uses seq to filter out empty strings from read-unfiltered, which makes the call to trim superfluous.

The sequence of words is going to be used to create a native clojure data structure: a hash-map that use a vector of words as the key, and returns a collection of words as possible suffixes. That takes all of two functions and six lines of code:

(defn add-map [markov-map prefix-plus]
(let [prefix (vec (butlast prefix-plus))]
(assoc markov-map prefix (conj (markov-map prefix) (last prefix-plus)))))

(defn build-map [prefix-len col]
(reduce add-map {} (partition (inc prefix-len) 1
(concat (repeat prefix-len nil) (col) [nil]))))

add-map adds the suffix to the map entry for the prefix that it is passed, using a fairly standard clojure idiom for adding to a map entry. This avoids some of the issues seen in the other implementations because conjing a value onto the nil returned by a map lookup on a missing key gives you back a collection containing just that value. build-map uses partition to create a lazy sequence of lists containing prefix and suffix from the input sequence, which is what add-map expects to be passed. Reduce is then used to build up the complete map by adding all of those in turn to an initially empty map.

Given that I started with a lazy sequence of words, it only made sense to finish with one. The last two functions generate that sequence, and then print the part of the results we want:

(defn generate [markov-map prefix]
(let [suffixes (markov-map (vec prefix))]
(if-let [suffix (nth suffixes (rand-int (count suffixes)))]
(lazy-cat [suffix] (generate markov-map (concat (rest prefix) [suffix]))))))

(defn word-monkey
([maxgen] (word-monkey maxgen 2))
([maxgen prefix-len]
(let [markov-map (build-map prefix-len read-words)]
(dorun (dec maxgen)
(map println (generate markov-map (repeat prefix-len nil)))))))

The generate function takes a map and an initial prefix, then returns a lazy sequence of words that follow that prefix by calling itself with the same map and a new prefix. The word-monkey function builds the map, then uses doseq to print either the sequence or the maximum number of words wanted from it, one word per line, as per TPoP.

Two issues here: one is that the code is a bit wider than I would have liked. My initial version assigned the lazy sequences to variables to shorten the lazy-cat call, but that meant I was holding the head of those sequences, which meant I had them all in memory, defeating the purpose of using the lazy sequences. The other is that we cheat a little on TPoP by using nil for the sentinel value. It has the prerequisite properties, in that it won't be generated by read-words, but saves having to create and assign it.

As expected, the clojure version at 23 lines is about the same length - counting only code lines, not blank or comment lines - as the similarly-commented Perl (18), awk (20) and Python (20) versions. The difference is that the word-monkey function is a function instead of the body of the script, and - unlike the others - this version doesn't fix the prefix length at 2, but allows it to be set by the invoker. Dropping those two things would make it three or four lines shorter.

As a final does of goodness, the add-map, build-map and generate functions will work with sequences of any objects, so long as nil isn't a valid object. The character-monkey version using those is:

(defn char-monkey [file maxgen prefix-len]
(let [markov-map (build-map prefix-len (fn [] (slurp file)))]
(dorun (dec maxgen)
(map print (generate markov-map (repeat prefix-len nil))))))

Sunday, August 22, 2010

Syntax highlighting for clojure code

Having left people hanging last week, the tool I used to format the
code for the blog hosted on blogger is:
GNU enscript.

None of the available tools would work out of the box. GNU enscript provides the most bang for the buck. I've been using it for printing text for a while now - it has all the flexibility the "in the browser" highlighters have, at least when printing postscript. While it's natural medium is postscript, it can also output html (which I use for the blog),rtf, overstrike (for printers) and ansi terminal codes.

The latter is particularly interesting. A quick addition to my shell aliases:
alias ccat="enscript -o - --language ansi --color"

allows me to just do:
$ ccat -E foo.sh

to print foo.sh in an xterm with syntax highlighting via the ansi control codes. I expect this to be very handy.

The not working out of the box problem with GNU enscript is that it doesn't have a highlighting mode for clojure. Adopting a state file from one of the LISP language variants was straightforward. Working through the code previously posted in the blog uncovered some corner cases that were easily fixed.  That clojure is a functional language did present one interesting choice: highlight all occurrences of the builtin functions, thus highlighting variables that reused those names, or only highlight them in the function slot in s-expression, thus missing them when they were passed as values to other functions? I eventually chose the latter, as that's what emacs does.

Once you've installed GNU Enscript, you'll want to get the clojure state file from the bitbucket repo for the blog, and then install it in the share/enscript/hl directory where enscript stores the language state files. After doing that, you can use -Eclojure to get clojure highlighting.

Thursday, August 19, 2010

On overengineering software

Overengineering in a software project results in code that is more
complicated than the problem at hand calls for. It's usually the
result of the engeineer being fascinated by the technology being used
for the problem, and hence applying it where it isn't appropriate, or
simply in excess.

I ran into a common - and very popular - example of this while moving
this blog from Wordpress to Blogger. Wordpress provides a tool for
displaying code from various programming languages with proper
highlighting. Blogger has no such facility. While this type of thing
is important - possibly critical for a blog on programming - it wasn't
such a big deal for me. I wasn't very happy with the Wordpress tool,
and got less happy as I looked into it.

Seems there are three broad classes of tools for this: stand-alone
facilities for the desktop, tools for exporting HTML from editors that
do the highlighting, and javascript that does the highlighting in the
client. The first two are a problem for blogging, as web-based
applications typically don't integrate well with the desktop, so you
wind up having to cut and paste the text, instead of just inserting
a file into the blog text.

The JavaScript one is slightly more convenient for the author - they
can cut and paste directly from the source, and then wrap tags around
to indicate what should be highlighted. They may be required to fix
characters that are magic to html, or not, depending on the blog and
software. It might also puts the code in scrollable are, number the
lines for reference, and add widgets to let you copy the text out
easily - all good things.

Unfortunately, these widgets aren't so convenient for the reader when
compared to the other alternatives. If the reader is using a
lightweight or downrev browser with no JavaScript support, is paranoid
enough to use something like NoScript or otherwise disable JavaScript
for untrusted domain, or is simply behind a corporate firewall that
disallows anything it thinks is executable code to reach the user,
then they get no highlighting. Since the highlighting is done in the
client at load time, it will slow down page loading. This effect
scales - the more code you have, the longer it takes to load the
page. Chances are the user won't notice this - they'll be reading text
until after the code is highlighted. However, as someone who develops
distributed systems, running this code every time the page is loaded
on every client system when it could be done once before the text is
uploaded just strikes me as wrong. Especially when that same
JavaScript could be used to do the job once in the authors browser
before the blog is posted instead of every readers browser every time
they load it.

I can see why the author did it this way - this is really cool
technology, and you get a really spiffy result. Doing the same thing
on the desktop just isn't as cool. Still, if the job is to deliver
highlighted text to the reader, the desktop versions do the job
better. The authors got caught up in the technology, and used it where
it wasn't quite appropriate.

Tuesday, February 23, 2010

There's not an app for that, take 2

In order to keep better track of what iPhone users are missing out on, I've set up a page listing Android applications - or at least categories of them - that Apple doesn't allow on the iPhone. You can find it under Pages on the right hand side of any page on the blog, listed as Apps you won't find in the app store.

Monday, February 22, 2010

Phrase groups in clojure

A recent interview with yags* consisted of solving problems and writing demo code for them, which is pretty standard for a gs. One of the problems captured my attention, because it looked like a perfect problem to explore the strengths and weaknesses of clojure.

The problem is simple enough to state: Given a set of lists of words, remove every phrase group that occurs in all the lists. A phrase group is any sequence of three or more words in a row. Two things make this interesting for clojure: 1) it's a purely functional problem: the output result depends only on the input values, and 2) it involves manipulating relatively deep structures - at least in the solution I came up with. That is something working with immutable data structures typically makes painful.

Since I'm still exploring clojure, I chose to do this in Python in the interview. The code here - and in the repository at bitbucket - uses the same algorithm, expressed with the high-level data structures of clojure instead of Python.

First, a couple of short helper functions:
(defn mfilter
"Return a hash-map built by removing entries for which (pred (key entry))
returns false from mapin."

[pred mapin]
(apply hash-map (apply concat (filter #(pred %) mapin))))

(defn enumerate
"Return pairs of an index into sequence, and the value at that index"
[sequence]
(map vector (iterate inc 0) sequence))
As the document string for mfilter says, it returns a copy of a hash-map built from a map by removing entries for which the predicate applied to the key is false. Likewise, enumerate counts the elements in a sequence, starting at 0, and returns pairs of them and the counter.

Note that we only have to worry about phrases of length three. A phrase of length 4 will show up as two phrases of length three with an overlap, and one of length five as three such phrases, all overlapping. So we can ignore phrases longer than three words. And of course, phrases shorter than three words aren't phrase groups, so we ignore them as well. So the list of phrases in an input list is a list triples that has two fewer elements than the input list, as the last two elements don't start phrases.

The data structure used for my solution is a dictionary, index by phrase groups - meaning triples of words. Each entry is also a dictionary, indexed by an input lists index in the list of input lists. The entries in those dictionaries is a list of places that that phrase group starts in the input list. I'm going to call this the phrase dictionary.

That leads us to the first building block function, which accepts a phrase dictionary and a phrase, along with the index of a list and the index of that phrase in the list, and returns the update phrase dictionary:
(defn add-phrase-to-phrase-dict [phrase-dict phrase list-index phrase-index]
(if (or (phrase-dict phrase) (= list-index 0))
(update-in phrase-dict [phrase list-index] conj phrase-index)
phrase-dict))
This uses the clojure function update-in, which is something I don't ever remember seeing in a lisp before (though cl's setf might be close). It finds an element in a nested structure - like the phrase dictionary - using it's first argument, which is a sequence of indices into the structure. That's [phrase list-index] , so this finds the list of places where phrase appears in the input list list-index. Further, if the elements aren't there, it creates the intermediate map needed, and return nils if the last lookup fails. The resulting value gets passed to the second argument, along with any remaining arguments, and the result of that call is used in this position in the new version of the structure that is returned. While this might sound expensive, since everything in the structure is immutable, the two versions can actually share everything but the values along the path to the updated value.

The conditional application of update-in is an optimization. If the phrase we're updating isn't in the dictionary, we want two different behaviors: if this is the first input list, we want to add that phrase to the dictionary. Otherwise, we can ignore the phrase, because it isn't in at least one input list - the first one. Adding it won't change the final result, but will generate more work. So we check for it there, and then only add it if this is the first input list. The alternative case returns the input phrase dictionary unmodified.

We now need to invoke this function on every phrase in every input list. Let's start with a function to invoke it on every phrase in a single input list:
(defn add-list-to-phrase-dict [phrase-dict list-index list]
(reduce (fn [phrase-dict [phrase-index phrase]]
(add-phrase-to-phrase-dict phrase-dict phrase list-index phrase-index))
phrase-dict
(enumerate (map vector list (rest list) (nthnext list 2)))))
As input, we get the phrase dictionary we're going to update, the index of the input list in the list of lists, and the input list itself. Wanting to update a value based on processing values in a list is what the lisp function reduce is for. It takes a function, an initial value, and a list, then calls that function with the initial value and each element in the list. So our function - introduced by (fn - takes two arguments, a phrase dict and a pair of phrase-index and phrase, and returns the result of calling add-phrase-to-phrase-dict with them and the input list values. The initial value is the input phrase dictionary. We generate the phrase list by mapping vector over the input list, and the results of removing the first and then second element from it, giving three-element vectors of three consecutive words - which are our phrases. Again, the primitive does the right thing for us, and stops producing maps when the last list runs out, so the last two words in the initial input list don't start phrases. We pass that list to enumerate to get the index and phrase pairs we need.

Given that, we basically repeat the same construct to deal with all the input lists:
(defn build-phrase-dict [lists]
(let [phrase-dict (reduce (fn [phrase-dict [list-index list]]
(add-list-to-phrase-dict phrase-dict
list-index list))
{}
(enumerate lists))
list-count (count lists)]
(mfilter #(= (count (val %)) list-count) phrase-dict)))
The input is a sequence of lists. That's the sequence we reduce over here, once again passing it to enumerate to generate our indices. This time, the initial value is an empty map. The most important change is that, instead of returning the result of reduce directly, we use the mfilter function defined earlier to remove any elements that don't have as many entries as we had input lists. That's the criteria for inclusion - that the phrase be in every list. If it isn't in some list, then it's dictionary entry won't have an entry for that list, and hence the count won't match the length of the list of lists.

That's the first half the problem - building up the phrase dictionary. We now need to use that to remove the phrases from the input lists. Oddly enough, the phrases themselves don't matter - we just care about their positions. So we write a function that takes the list of phrase positions, and removes them from a list:
(defn remove-phrases-from-list [phrase-starts list]
(mfilter (fn [[x _]] (not-any? #(and (>= x %) (< x (+ % 3))) phrase-starts))
(apply sorted-map (mapcat vector (iterate inc 0) list))))
Here's the second mfilter - this time, it's remove words from the list. We start by creating a sorted map indexed by word position of the words in the list. Then mfilter uses not-any? to check each word's position against the phrase start positions, specifically that the position is not greater than the start position and less than three plus the start posistion. If a position falls into that range for any phrase in the phrase-starts list, it's removed.

So all that's left is printing the result:
(defn remove-shared-phrases [lists]
(let [phrases (apply merge-with concat (vals (build-phrase-dict lists)))]
(doseq [[idx list] (enumerate lists)]
(println (vals (remove-phrases-from-list (phrases idx) list))))))
This actually combines everything so far: building the phrase dictionary from the list, extracting the list of phrase positions from that, using merge-with to concat the lists in each of the element dictionaries into a single list for each file, then using doseq to call println on the result of removing the phrases in each list from the list. At this point, we see why the output map was a sorted-map: that forces the results of each list to print in the proper order.

All in all, this has been an illuminating exercise. Clojure's functions for dealing with deep structures and non-list structures work very well with them, providing what is still a very Lisp-like environment, even though we're not working with lists. In particular, the handling of what in python were special cases is right by default, which feels very much like classic lisp behavior. I'd say Clojure held up very well here.

*) That would be Yet Another Google Spinoff.

Saturday, February 13, 2010

mired in code now available from bitbucket

To make it easier to get the code published here I've set up a mercurial repository at bitbucket.org. Assuming you have mercurial installed, then you can get a current copy of the code with:

$ hg clone https://bitbucket.org/mwm/mired-in-code/

That will also leave the most recent code checked out. You can update the repository and check out the most recent updates with:

$ hg pull -u

For anything more advanced, you'll want to check out mercurials documentation with:

$ hg help

Friday, February 12, 2010

bitchin web framework - templates

One of the advantages of making the templates functions is that clojure  does almost all the work. Tags basically come in two flavors, depending on whether they render newlines between the elements of their contents to make the resulting HTML more readable. So we could define a tag like so:
(defn html
([] (format "<%s />" "html"))
([attribs & args]
(if (associative? attribs)
(str "<" "html" (make-attrib-strings attribs)
(if (nil? args)
" />"
(format ">%s%s%s</%s>" \n (str-join \n args) \n "html")))
(apply html (concat [{} attribs] args)))))
Where make-attrib-strings handles the attribute map, and looks like:
(defn- make-attrib-strings [attrs]
(apply str (map #(format " %s=%s" (if (keyword? %) (name %) %)
(quoted-attribute (attrs %)))
(keys attrs))))

(defn quoted-attribute [input]
(format "\"%s\""
(apply str (replace {\& "&amp;", \" "&quot;", \> "&gt;"} (str input)))))
These two functions are fairly straightforward. quoted-attrib takes a string, and returns it with the characters that are dangerous inside of attributes - '"', '&' and '>' (the latter only for very old browsers, but it's easy to fix as well) and returns a string with them replaced by the appropriate XHTML entities. We have to process the output through str, as the result of replace is a sequence. Likewise, we pass the input through str, just in case they gave us a symbol (which we will use later). make-attrib-string is passed an associative object, and maps the keys through a function that turns it into a stringo f the form key="value", with value being quoted by quoted-attribute. It does check to see if the key is a keyword, and if so uses it's name, so that users can use :href instead of 'href. Again, we pass the results to str to turn it into a string for output.

The problem with the above approach is that it requires writing out that rather long function for every tag we want to use. Fortunately, clojure provides us with macros, so we can just write a macro to do all that for us, like so:
(defmacro make-tags
([sep name & names]
`(do
(make-tags ~sep ~name)
(make-tags ~sep ~@names)))
([sep name]
(let [name# (first (re-split #"_" (str name)))]
`(defn ~name
([] (format "<%s />" ~name#))
([attribs# & args#]
(if (associative? attribs#)
(str "<" ~name# (make-attrib-strings attribs#)
(if (nil? args#)
" />"
(format ">%s%s%s</%s>"
~sep (str-join ~sep args#) ~sep ~name#)))
(apply ~name (concat [{} attribs#] args#))))))))
The first clause is a standard clojure idiom - we want to allow multiple tag names, so we handle the case with more than one name by calling the macro with one name, and then again with the rest of the names.

The second clause is the code from the html function above, with the
string or symbol html taken out, and replaced by an
expression involving name. With one hiccup - some of the html tag names are core clojure function names, so we can't use them. Instead, we use map_ and meta_, and extract the string into the symbol name#. The # causes clojure to generate a unique symbol for this binding, so we don't have to worry about name collisions. The rest of the macro is the function from above, with ~name or ~name# (the ~ causing evaluation) replacing what was html, and ~sep.

So now we can put that all together, along with two invocations of the macro to create all the tags we normally use. And of course, the user gets the make-tags function if they want to use custom tags of some kind or another.
(ns org.mired.bitchin.html
(:use [clojure.contrib.str-utils :only (str-join re-split)]))

(defmacro make-tags
([sep name & names]
`(do
(make-tags ~sep ~name)
(make-tags ~sep ~@names)))
([sep name]
(let [name# (first (re-split #"_" (str name)))]
`(defn ~name
([] (format "<%s />" ~name#))
([attribs# & args#]
(if (associative? attribs#)
(str "<" ~name# (make-attrib-strings attribs#)
(if (nil? args#)
" />"
(format ">%s%s%s</%s>"
~sep (str-join ~sep args#) ~sep ~name#)))
(apply ~name (concat [{} attribs#] args#))))))))

(defn quoted-attribute [input]
(format "\"%s\""
(apply str (replace {\& "&amp;", \" "&quot;", \> "&gt;"} (str input)))))

(defn- make-attrib-strings [attrs]
(apply str (map #(format " %s=%s" (if (keyword? %) (name %) %)
(quoted-attribute (attrs %)))
(keys attrs))))


;; The standard html tags with newlines between element in the contents
(make-tags "\n" html head meta_ base link script style body div dl ol ul li table
tr colgroup col thead tbody tfoot form select input)

;; And now the same, with nothing between elements in the contents.
(make-tags "" title h1 h2 h3 h4 h5 h6 p pre address blockquote hr dd dt th td
caption option label textarea button a b i big small strong em q
tt a cite dfn abbr acronym code samp kbd var sub del ins font span img
br map_ area object param embed noembed)
As expected, the tests had to be debugged as well. Mostly, it was spacing inside of tags. However, the design changed just slightly - the template system doesn't automatically quote characters in content strings (but it does in attribute strings). The recursive nature of the invocations makes it hard - if not impossible - to keep track of what has and has not been quoted, and quoting a second time would be a failure. This may be a reason to change from a code representation to a vector representation, if the problem can be solved.

Sunday, February 7, 2010

The bitchin web framework - template testing

I've been building web sites for nearly twenty years now. In that time, I've learned some lessons that apply here:

1) I shouldn't do visual or interface design. My eye for graphics sucks, and my mind doesn't work the way most users do when it comes to UI issues.

2) CSS is sufficient for visual design. The existence of "div soup" web pages demonstrates that. Since those can go in a stylesheet file, we get automatic separation of design and content. Given #1, this is good for me.

3) If you're doing HTML, the reader always gets the final say about presentation! There are browser with options - or extensions that add options - to turn off CSS completely, or use the users own CSS. Worst comes to worst, they can use View Source on your page. Attempting to tell them how to set their browser will, if you're lucky, get you laughed at before they read your text. If you're not lucky, they'll take their business elsewhere.

With those in mind, bitchin is a programmers framework. Templates are meant to be embedded in code, with design decisions going into a style sheet.

A template is a data structure that can be "evaluated" in a context to fill in values from various data sources to provide content, producing HTML text - or a fragment thereof. In most programming languages, we have a tool for that already - it's called a function! Further, modern HTML is an XML application, and the parallels between S-expressions and XML have been pointed out numerous times. Given that Clojure functions are S-expressions - well, mostly - I'm going to use them as bitchin templates.

So a tag is a callable object. If the first argument is a Clojure map - or any associative object - the key/value pairs will turn into attributes for the tag. Any remaining arguments will be evaluated to generate the contents of the tag. So a typical simple web page template might look like:
(html (head (title "A short test"))
(body (h1 "A short test"))
(p "And some text")))
And then render as:
<html>
<head>
<title>A short test</title>
</head>
<body>
<h1>A short test</h1>
<p>And some text</p>
</body>
</html>
So the first step is test code. While I am a fan of test suites, I'm not a fan of test driven development. The problem is - the tests are code, and hence likely to have bugs. The only proper test for them is the code they're supposed to test. So to do TDD on the tests, you need to write the code to be tested first. The reality is that it doesn't matter which you write first - you're going to debug them in parallel. So I normally write them in parallel as well.

In this case, the tests are providing examples for how the code should behave. As such, that's worth writing out explicitly. Given the lose nature of XML, I'll probably still wind up debugging them in parallel.

The test code uses the clojure.test unit testing framework, dividing the tests up into groups, doing a simple equality comparison between the string we want out and the code that should generate it:
;; Unit tests

(ns org.mired.bitchin.test
[:use [clojure.test] :only (is)]
[:use org.mired.bitchin.html])

(deftest inline-elements
(is (= "<p />" (p)) "empty contents")
(is (= "<map />" (map_)) "keyword tag")
(is (= "<p>now</p>" (p "now")) "string contents")
(is (= "<p>now</p>" (p 'now)) "symbol contents")
(is (= "<p>:now</p>" (p :now)) "keyword contents")
(is (= "<p><br /></p>" (p (br))) "element content")
(is (= "<p>now &amp;</p>" (p "now &")) "ampersand encoding")
(is (= "<p>now &lt;</p>" (p "now <")) "less than encoding")
(is (= "<p>now is the</p>" (p "now " 'is " the")) "multipart contents"))

(deftest block-elements
(is (= "<div />" (div)) "empty contents")
(is (= "<meta />" (meta_)) "keyword tag")
(is (= "<div>\nthen\n</div>" (div "then")) "string contents")
(is (= "<div>\nthen\n</div>" (div 'then)) "symbol contents")
(is (= "<div>\n:then\n</div>" (div :then)) "keyword contents")
(is (= "<div>\n<br />\n</div>" (div (br))) "element contents")
(is (= "<p>now &amp;</p>" (p "now &")) "ampersand encoding")
(is (= "<p>now &lt;</p>" (p "now <")) "less than encoding")
(is (= "<div>\nthen\nwas\nthe\n</div>" (div "then" 'was "the"))
"multipart contents"))

(deftest mixed-content
(is (= "<div>\n<span>now is the</span>\n<b>time</b>\n</div>"
(div (span "now is the") (b 'time)))))

(deftest attributes-block
(is (= "<div border=\"0\" />" (div {:border 0})) "keyword name")
(is (= "<div border=\"0\" />" (div {'border 0})) "symbol name")
(is (= "<div border=\"0\" />" (div {"border" 0})) "string name")
(is (= "<div width=\"100%\" />" (div {:width "100%"})) "string value")
(is (= "<div border=\"0\" width=\"100%\" />" (div {:border 0, 'width "100%"}))
"multiple attributes")
(is (= "<div border=\"0\" width=\"100%\">\n<span width=\"90%\">Text</span>\n</div>"
(div {"border" 0, :width "100%"} (span {'width "90%"} "Text")))
"multiple attributes with content"))

(deftest attributes-inline
(is (= "<span border=\"0\" />" (span {:border 0})) "keyword name")
(is (= "<span border=\"0\" />" (span {'border 0})) "symbol name")
(is (= "<span border=\"0\" />" (span {"border" 0})) "string name")
(is (= "<span width=\"100%\" />" (span {:width "100%"})) "string value")
(is (= "<span border=\"0\" width=\"100%\" />" (span {:border 0, 'width "100%"}))
"multiple attributes")
(is (= "<span class=\"now&amp;then\" />" (span {:class "now&then"}))
"ampersand encoding")
(is (= "<span class=\"&quot;FOO&quot;\" />" (span {:class "\"FOO\""}))
"quote encoding")
(is (= "<span class=\"x&gt;3\" />" (span {:class "x>3"}))
"greater than encoding")
(is (= "<span border=\"0\" width=\"100%\"><b width=\"90%\">Text</b></span>"
(span {"border" 0, :width "100%"} (b {'width "90%"} "Text")))
"multiple attributes with content"))

(run-tests 'org.mired.bitchin.test)

Friday, January 29, 2010

Backup on ZFS, part 1

One of the nice things about having systems on ZFS was that the disk failures in the last few days didn't cost me any noticeable downtime per se. Pulling and replacing disks - without hot swappable hardware - and the system upgrade those inspired still costs time, as are hardware failures that leave a system unbootable. But in general, disk problems with ZFS file systems are just minor problems: you notice the disk is no longer in service, decide how to deal with it, and then do so.

Part of that is having reliable backups. ZFS makes even that easier. The best example is of course the OpenSolaris "Time Slider" tool, which uses the ZFS snapshot feature to let you recover old versions of files. Snapshots also make backups to other disks - suitable for taking offsite, for instance - easier to deal with as well.

As disks have gotten cheap, it's become common to keep backups on line. A typical home-grown backup script will use something like rsync to copy files to the destination disk, or file server. To make old versions available, it will then play games with a copy of the directory tree and symlinks to create an image of the tree at that time while not duplicating files that haven't changed between backups.

Snapshots can go one better. If your copy software will write just changed blocked in a file, instead of recreating the entire file, then the blocks that haven't changed in a file will also be shared across snapshots. Better yet, the snapshot can be created by running one command - a "zfs snapshot backuppool/mybackup" on the system the backup resides on.

The final nicety is that even systems without the hardware oomph for ZFS - it was designed for 64 bit CPUs with a gigabyte of ram - or an OS that doesn't support ZFS can take advantage of this in their backups. Here's the script I use for  my local backups. While I use it in production, it's not up to product status, in that it's really intended for use by relatively astute system admins. In particular, there's no nice error reporting, no simple tools for either complete restores or simple file recovery, etc. Those shouldn't be hard to build on top of this, but these are good enough for my use.

As with the previous script, the goal is more to get people thinking about how to leverage ZFS for these types of chores. If you've already done that and have tools available, provide a link in the comments and I'll pull it into the body so you get the traffic. If you feel moved to productize this script - the same applies.
#!/bin/sh

BACKUP_DEST=/export/backups
BACKUP_FS=external/export/backups
BACKUP_HOST=backups
BACKUP_USER=operator

if [ "$DEBUG" = "" ]
then
ECHO=""
else
ECHO=echo
fi

case $(uname) in
Darwin)
dump_list=$(df -T ufs,hfs | awk 'NR != 1 { print $NF }') ;
extra_flags="--extended-attribues"
hostname=$(hostname -s) ;;
FreeBSD)
dump_list=$(mount -p -t ufs,zfs | awk ' { print $2 }') ;
extra_flags="--acls --xattrs"
hostname=$(hostname -s) ;;
SunOS)
dump_list=$(/usr/gnu/bin/df -P -t zfs -t ufs | awk 'NR != 1 && !/^external/ { print $NF }') ;
extra_flags=""
hostname=$(hostname) ;;
esac

if [ $# -eq 0 ]
then
dump_name=$hostname
else
dump_name=$1; shift
dump_list="$@"
fi

for dir in $dump_list
do
case $dir in
/tmp*) echo Skipping $dir ;;
*) $ECHO rsync --verbose --archive --hard-links --delete --one-file-system --no-whole-file --exclude /.zfs $dir $BACKUP_DEST/$dump_name$dir ;;
esac
done

SNAPSHOT_COMMAND="/usr/sbin/zfs snapshot -r $BACKUP_FS/$dump_name@$(date +%F)"
if [ "$BACKUP_HOST" = "$hostname" ]
then
$ECHO $SNAPSHOT_COMMAND
else
$ECHO su $BACKUP_USER -c "ssh $BACKUP_HOST 'pfexec $SNAPSHOT_COMMAND'"
fi

Monday, January 25, 2010

Some better practices for ZFS on FreeBSD

Rather than working on the clojure web framework, I've been dealing with broken hardware, including some system reinstalls. So let's talk about that.

ZFS has been available in FreeBSD for a while now, and in the recent released 8.0 is now considered production quality. There are a number of write ups on the web about how to set up various configurations of FreeBSD on ZFS: with a UFS boot, on a GPT mac drive, with no UFS at all, etc. Most seem to have one thing in common - they just duplicate a standard FreeBSD UFS file system configuration, without seeming to consider how ZFS has changed the game. Not really the fault of the author; I did much the same when I set up my first ZFS system a few years ago. But having those few years experience - and seeing how the OpenSolaris folks set things up - indicates that there are better ways. I want to talk about that in hopes of getting others to spend more time thinking about this.

First, a note on terminology. Those of you familiar with FreeBSD on X86 can skip this. Unix has had "partitions" since before there was a DOS. FreeBSD continues to call them that. What the DOS folks - and most everyone else - calls partitions are called "slices". A FreeBSD installation typically has one slice for FreeBSD on the disk, with multiple partitions - one per file system - in that slice. Slices are numbered starting at 1. Partitions are lettered, usually a-h. A typical FreeBSD partition name is ad0s1a, meaning drive number 0 on the ATA controller, slice 1, partition a.

Now a quick overview of how to set up FreeBSD with a ZFS root file system. Details area easy to find in google if you need them;

  1. Partition the drive, providing a swap and data partition. If you're using GPT for partitioning, you'll need a boot partition as well. Note that on OpenSolaris, giving ZFS a partition is a bad idea, as it disabled write caching on the drive because OpenSolaris has file systems that can't handle drive write caching. On FreeBSD, all the file system handle drive write caching properly, so this isn't a problem.

  2. Create a zfs pool on that partition.

  3. Install the system onto an fs in that pool. Most people seem to like copying the files from a running system. I used the method documented in /usr/src/UPDATING to install to a fresh partition. For that to work cleanly, you'll want  NO_FSCHG defined in /etc/make.conf, or -DNO_FSCHG on the command line, as FreeBSD's zfs doesn't do attribbutes. You'll also need to make sure that /boot/loader was built with LOADER_ZFS_SUPPORT defined.

  4. Install a boot loader. Just install the appropriate ones for your partitioning scheme.

  5. Config for zfs. You may want to set the bootfs property on your pool to the root file system to tell the boot loader where to find /boot/loader. You'll want to set zfs_load="YES" and vfs.root.mountfrom="zfs:data/root/fs" in /boot/loader.conf to tell the loader where the root file system is. Set zfs_enable="yes" in /etc/rc.conf so the system knows to turn on zfs. Finally, to prevent zfs from trying to mount your root file system a second time, set the mountpoint property to "legacy" on that file system.

  6. Last step: export and import the resulting pool, then copy /boot/zfs/zpool.cache to the same location on your new system.


Again, this is a quick overview. Google for details if you need them.

Now to the point - how to set up your filesystems under ZFS, considering how ZFS has changed the game.

For instance, it's much more robust than the UFS file systems, so there's little point in creating partitions to protect things from corruption - though the UFS file systems have been solid enough for that for a while. Likewise, ZFS file systems aren't locked to a pool of blocks, so there's not much point creating file systems to allocate disk space to different purposes - though you can put limits on a specific file system if you want to. Those are the classic reasons to set up new file systems.

With ZFS, file systems are cheap and easy to create. The reason for creating one is that you want to have different properties on it than on it's parent, or that you might want to have a group of file systems inherit some set of properties. You might also want to use the spiffy ZFS snapshot/clone tools on some file system without using it on them all.

So, the first thing to notice is that it's easy to boot from a different root file system. Booting from a UFS file system needs the root on partition a - unless that's been changed and I didn't notice - meaning you need to create a new slice for it, and possibly put it on a new disk. With ZFS, you can boot from a new root file system by changing two settings: bootfs on the boot pool, and vfs.root.mountfrom in /boot/loader.conf  (i'm sure one of those will vanish at some point) and rebooting. So you could, in theory, have a couple of different versions of FreeBSD installed on the same pool, and boot between them.

In fact, that looks like it's worth automating, as it's trivial to do, and will cut down the number of places you can typo the pool name. So here's zfsbootfrom.sh:
#!/bin/sh
FS=$1
POOL=$(echo $FS | sed 's;/.*;;')
$DEBUG zfs set bootfs=$FS $POOL
$DEBUG sed -i .old "/vfs.root.mountfrom/s;=.*;=\"zfs:$FS\"" /boot/loader.conf
This is simple enough I didn't add options; to debug it, just run it as "DEBUG=echo zfsbootfrom.sh newrootfs". Better yet, grab the tool cryx mentioned in the comments from http://anonsvn.h3q.com/projects/freebsd-patches/browser/manageBE/manageBE.

You can do this with your root file system as the top file system in the pool, but that's going to get confusing eventually. Best to sort things out into their own group. So my suggestion is that the root file system be something like "data/root/8.0". An 8-STABLE root might be "data/root/8-STABLE".

You can even avoid having to do fresh installs on each new system - unless you want to - by leveraging the power of zfs. To wit:
zfs snapshot data/root/8.0@NOW
zfs clone data/root/8.0@NOW /data/root/8-STABLE
mount -t zfs data/root/8-STABLE /altroot
cd /altroot/usr/src
make update
# proceed with bind and install to /altroot, rather than modifying your running system.
can now boot that, and try a new version of  FreeBSD - without having to change your old file system. If it doesn't work, just reset the bootfromzfs values, delete the file system, and try again later. Or ignore it until you feel like updating it and trying again later.

So, what things would we want to share between two different versions of FreeBSD this way? Not /usr - the userland is pretty tightly tied to the kernel in FreeBSD. usr/local? Certainly - packages and anything you build will work on updates to the current release, and on the next one (or more) with the appropriate compatibility options. For that matter, /usr/ports probably wants to be it's own file system since the ports project explicitly supports multiple versions. /etc? Maybe. The system knobs can probably be shared, but some applying and some not for each system will be confusing. On the other hand, the ports/package system writes to /etc/make.conf as well. If you're not running a mirrored root, you might consider making /etc a file system just to set "copies=2" to improve reliability. /home? Certainly it should be shared. /var? Most likely, as ports put info in there as well as the normal spool things. In fact, enough different things go on there you may want it to be a file system so you can create subfilesystems with the appropriate properties. If you're exporting file systems, you can create an fs to hold them, and set the appropriate property on it to the most common value so you don't have to set it on it's children. The file systems underneath that will then all be exported automatically.

That said, you might want to do step 2 above something like so:
zpool create data ad0s1
zfs create -p data/root/8.0
zfs create -o mountpoint=/home data/home
zfs create -o mountpoint=/usr/ports -o compression=on data/ports
zfs create -o compression=off data/ports/distfiles
zfs create -o mountpoint=/export -o sharenfs="rw=@192.168.195.0/24"  data/export
zfs create -o mountpoint=/var data/var
zfs create -o copies=2 data/root/8.0/etc
You can also set the properties exec (allow - or not - execution of files on that fs) and setuid (honor - or not - the setuid bit) as appropriate for each of these file systems. /var, in particular, bears a closer look. You might consider turning off setuid and exec on it and most of it's descendants. /var/log might be compressed unless you send all your logs elsewhere. /var/db/pkg is a candidate for compression. Some database packages install things in /var/db as well; in which case you might want to check to the zfs best practices wiki for that database.

One final note. I mentioned a mirrored root pool. I run most of my systems this way, and recommend it.They meant that, even though the hardwares did cost me time, it was to repair them, not because the services in question were unavailable. Setting up the mirror is simple. You'll need to install the boot loaders on both disks - that's step 4 above for the second disk. You also need to use a mirror vdev on the zpoolc create command. That changes the command to something like "zpool create data mirror ad0s1 ad1s1". The rest of the zfs commands will be the same; the only thing that knows that the pool is mirrored is zpool.