Custom Search

Sunday, June 5, 2011

Dynamic types in a statically typed language

One of the more interesting surprises in Haskell was the regular expression library - or rather, libraries, as the same API can be used with a number of different regular expression engines. Before looking at that, let's look at a couple of more conventional languages.

Python has a typical OO regex API: the module includes a couple of routines for creating regex objects or running common sequences of actions using them (compile, find, match), the regex objects have a couple of methods for matching (find, match, findall, finditer) one or more times or running common sequences of actions for this regexp, most of which return match objects that have methods to query the result of the match (group, groups, pos, span, etc.).

Perl, on the other hand, has a single regex type that is built with the / or m operators and then matched against a string.  The result of the match is either a list of groups from the regex (in a list context) or a boolean indicating whether or not it matched (in a scalar context), with variables set to the groups as a side effect. Various regex features can be used with loops to iterate through multiple matches, etc.

Other functional languages have regex objects and collections of functions for manipulating them, similar to the methods of the Python compiled regex objects.

You would expect that Haskell - being neither OO nor dynamically typed - would follow those other functional languages. Except that Haskell's type system is very flexible. In particular, it uses type variables to stand in for actual types (what some languages call generics) - even in compiled code. It also has a typeclass system that can be used to constrain type variables to types which implement a specific set of operations. The type returned by a function can be a typeclass, and the compiler will check all the known instances of the typeclass against the type the calling context expects, then choose an instance of the typeclass that matches the calling context.

The Haskell regex library includes a typecless - in Text.Base.Regex.RegexLike - and instances - in Text.Regex.Base.Context - that includes instances for Bool, groups from a regex, lists of matches (as returned by Python's findall), lists of match positions, and more. There's essentially one operator =~ (=~~ is for use in monads) that matches a string representation of a regex against a string and return the typeclass instance for that context.

Instead of having a scalar context, the Haskell regex operator has three of them - Bool contexts get a did/didn't match return, Int contexts get a count of matches in the string, and String contexts get the string for the first match. There are also tuple contexts that can return things like start & length information for the match, before/match/after strings, before/match/after/groups, and so on. Contexts of lists of scalars or tuples should return a list of those values for all the matches found.

So the answer then is that Haskell's regex operators are dynamically typed, like Perl's - only with a much finer set of contexts, since they are more completely typed. Further, whereas Perl's dynamic regex operators are built into the language and not something you could write in user code, Haskell's dynamic regex operators are written in user code, and the technics can be used anywhere it might be appropriate. As a final bonus, Haskell's operators are statically typed, so if the type you ask for isn't one of the types that the regex operator returns, you will get a type error at compile time.