Custom Search

Friday, October 28, 2011

Leveraging Haskell's derived classes.

One of the mantras of dynamic languages is "Minimize boilerplate". By boilerplate they mean repeated code sequences that just change a few names. A typical example is the setter/getter methods of Java objects - at least most of  them. Python, for example, deals with those by providing a decorater that turns an attribute get or set into a method invocation. This eliminates the primary motivation for setter/getter boilerplate, since you can turn attribute accesses into methods without changing the external API.

I recently wrote enough boilerplate code for a Haskell application that I went searching for a way to eliminate it. Sure enough, I found one. It uses a technique that doesn't seem to be discussed on the web, even though it's so powerful that I eliminated all of the boilerplate that had driven me to this search.

A little Haskell data

This section explains some of the syntax of Haskell. If you're already familiar with that, you should probably skip down to the next section.

Haskell uses about the simplest construct possible for creating the equivalent of a record or structure (or maybe class) in other lanugages:

    data Name = Constructor [ Type ... ]

This creates the new data type Name with a constructor function named Constructor. Following that is a list of the type of each element that is going to be in Name.

A union type isn't much different, and explains why you need both a Constructor and Name in the data declaration:
    data Name = Constructor1 [ Type1 ... ] | Constructor2 [Type2 ... ] [| .... ]

Which says that an object of type Name will be either a Constructor1 object or a Constructor2 object or a Constructor from that last ..., each of which can be built by the appropriate constructor name.

The last bit of syntax is an optional
deriving clause at the end of a data statement. It lets you specify a list of typeclasses - from a relatively small set - and the compiler will automatically provide the boilerplate to make your new type implement the interfaces for those typeclasses. Besides the limit on which typeclasses this works for, all the components of your type have to implement the interfaces you want to use.

The problem data type

One of the Project Euler problems involves evaluating poker hands. So, given the appropriate definitions of Rank (the value of a card - Ace, Two, etc) and Card ( a Rank and a Suit), a data type that represents poker hands would be: 

    data Hand = HighCard [Card]
              | PairOf Rank [Card]
              | TwoPair Rank Rank [Card]
              | ThreeOfAKind Rank [Card]
              | Straight [Card]
              | Flush [Card]
              | FullHouse Rank Rank [Card]
              | FourOfAKind Rank [Card]
              | StraightFlush [Card]
                deriving (Show, Eq, Ord)

I'll get straight to the punchline: the above is sufficient for evaluating poker hands. The Ord typeclass provides the ordering functions <>, and similar as well as min and max. The automatically derived code will work such that hand1 > hand2 if and only if hand1 is a better poker hand than hand2. Provided, of course, that the code that creates the hands is correct.

The derived operators

The operators you get from Ord have straightforward behavior (though I couldn't seem to find an explanation of it outside the language specification). A list - which in haskell holds homogeneous types, but can be of different lengths - is compared elementwise from front to back. The first non-equal pair determines the result. If one list ends before that and before the other then it has a lesser value. If they end at the same time without any differences, they're equal. This is a pretty standard behavior for list comparison, implemented by most modern languages. Tuples (not used above) are a fixed-size list of heterogeneous types. The elements are also compared in order. If two tuples are of different lengths, comparing them is a compile time type error. Ditto if a pair of elements that need to be compared aren't comparable. Again, the derived behavior is pretty standard behavior for modern languages.

A record data type is, to a large degree, isomorphic to a tuple of the same list of types. The derived behavior follows that - the elements in the record are compared in the order they appear in the constructor. This is rather unusual. At least, I'm not aware of other languages that do this automatically, or advertise it if they do. Many modern ones don't really have a record type, but instead have a class type that mixes in other behaviors as well.

Finally, the union type. The comparison is simple: those that appear first in the data statement have lower values. The closest thing to this I've seen elsewhere are enum types. Oddly enough, if none of your constructors have an argument - meaning they're just a bare token - you have what amounts to an enum type, and that same ordering rule applies.

In action

So, we now know how the derived ordering behaves, and have a relatively complex data type to see how it works.

Given two hands to compare, the first check will be whether or not they use the same constructor. If so, those will be compared as such. If not, then the first one in the list is the lesser value. Checking the declaration, that means  a StraightFlush beats everything, FourOfAKind everything but a flush, etc. Exactly what we want.

If the two hands are the same type, then the elements of the hand are compared. That's why there's a Rank or two for some hands - you compare the cards that give the hand it's type before comparing high cards in the rest of the hand. So if those Ranks are correct - and in the right order for TwoPair and FullHouse - then the derived comparison will work correctly. As an aside, for some games you only need the first rank for a FullHouse and don't need the [Cards] for it, ThreeOfAKind and FourOfAKind, because two hands where those match are impossible. But providing them makes the code more symmetric and saves code in the long run.

Finally, if any Ranks present are the same, the [Card]s are compared. It's sufficient to make sure the [Card]s are sorted. While the cards in the Ranks aren't part of that comparison in poker, they will either wind up in the same place in the lists and be equal or the comparison will be done before their places are checked.

Conclusion

So, having written the data statement properly, and making sure that all Hands are properly formed, the derived code will do the right thing. All of the essentially boilerplate code to compare two Hands has been eliminated.  I can make sure that all Hands are properly formed by exporting a Hand-making function that does that, and not exporting any of the Hand constructors. This means the only way client code can get a Hand to compare is to use my Hand-making function, and anything else compared to a Hand will cause a compile time type error.