Custom Search

Tuesday, August 21, 2012

Monads in Python

Just a quick note based on a recent Aha moment, showing
that learning Haskell is improving my Python coding.

C# and Python's string.find

I'm translating some C# into Python for a client, because -
well, C# just isn't suited to big data problems. They keep running
into problems with it, and asked me to come aboard to redo this in
Python. This code involves lots of string processesing. It
translates from C# into Python by changing the method names and
block delimiters, so I get lots of things like:


i = data.find('target1')
if i > -1:
    i = data.find('target2', i + 7)
    if i > -1:
        i2 = data.find('targetend', i + 7)
 if i > -1:
     result = data[i + 7:i2]

Yeah, it's ugly. But I recognized the pattern: that's just
Haskell's Maybe Monad! string.find returns a value of
type Maybe Int, with -1 as the
Nothing value.

Haskell's Maybe monad and string.index

Haskell has a better way to string together the result of Maybe
functions that cause any maybe to skip to the end, using the
>> operator. But Python also has a better way, using
string.index as the function returning a value of
type Maybe Int. This lets me rewrite the code as:


try:
    i = data.index('target1')
    i = data.index('target2', i + 7)
    i2 = data.index('targetend')
    result = data[i + 7:i2]
except ValueError:
    pass

Here, the Nothing value is the
ValueError exception. For absolute technical
correctness, the assignment to result ought to be in the
else: clause of the try statement, but
either way reads better than the string of if's
crawling across the page


2 comments:


  1. It is a nice approach, and a good use of a Pattern, using the exception; but wouldn't it be Even Better to use a regexp?

    targetre = re.compile('target1.*target2(.*)targetend')
    m = targetre.match(data)
    if m:
    result = m.group(1)

    And of course you could compile the regexp once, and use it repeatedly.

    ReplyDelete
  2. In the one example I used, a regular expression is a better solution. However, this pattern cleanly extends from cases where RE's are overkill (searching for a constant string) to handling things that RE's can't, not to mention not needing to change things if the patterns are variables instead of fixed strings.

    If you've got one or two such things in a large program, than picking the best solution for each works well. When your program consists of trying dozens of such things, using one tool that can cleanly handle all of them seems like a better solution.

    ReplyDelete