Custom Search

Saturday, December 22, 2012

Type checking for algorithmic errors

While listening to a recent podcast about Yesod - a web framework designed to be type safe - the host commented that he didn't buy into static type checking, because it was neither necessary nor sufficient, as it couldn't catch algorithmic bugs. While he's right that static typing isn't necessary - otherwise dynamic languages wouldn't work - or sufficient - it can't catch all bugs, he's wrong in that it can catch algorithmic bugs. At least some of them. The example he gave - that if you have a month and a day, the type system can't insure that the two are a valid date - is one such case. I threw together a quick example of that, which I'll present here.


The idea is to encode the algorithm state in the data type. So there's an input data type, call it Step0Type. Then the processing at each step will accept data of type Step(N-1)Type and produce data of type StepNType. This means that if you have an error where you skipped a step on some piece of data, you'll try passing type Step(N-1)Type to a function expecting StepNType, so the type checker will catch the error.

Most practical examples only need one step. If you're passing strings to an external command processor - a shell, an SQL server or whatever - that's suspectable to to data injection attacks, having a type that's been sanitized against such attacks that the function that actually handles the IO requires will allow the type checker to flag attempts to pass unsanitized commands to the external processor. Yesod uses this technique for HTML, CSS and JavaScript text, each having a type indicating it's been quoted, to insure that each winds up only where it belongs, and properly quoted.

For the date example, the input type is three values consisting of a year, month and day of the month. The output data type represents a date known to be valid. You then only allow users of the type to access the function that that creates valid dates. So, we're going to write a Date type that holds the year, month and day, and a date function that takes a year, month and day as arguments, verifies that they represent a valid date and returns the appropriate Date. The package doesn't export the primitive Date constructor, but does export the function that constructs only valid dates, so that any Date objects that appear in client code will be valid. Functions declared to accept Date parameters will never see an invalid date.


The data types:
type Year = Int
type Day = Int

data Month = Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec
           deriving (Show, Eq, Ord, Enum)

data Date = Date Year Month Day deriving (Show, Eq, Ord)
These are pretty generic declarations. The only typing information is that Year and Day are introduced as aliases for Int. Now the function that checks that a date is valid and returns the Date object, plus a couple of helpers:
date :: Year -> Month -> Day -> Date
date year month day | year == 1752 && month == Sep && day > 2 && day < 14 =
  error "Date not in US calendar."
                    | day < 1 = error "Month days start at 1."
                    | day > daysInMonth year month = error "Day not in month."
                    | otherwise = Date year month day

daysInMonth :: Year -> Month -> Day
daysInMonth year month | month `elem` [Jan, Mar, May, Jul, Aug, Oct, Dec] = 31
                       | month `elem` [Apr, Jun, Sep, Nov] = 30
                       | month == Feb = if isLeapYear year then 29 else 28

isLeapYear :: Year -> Bool
isLeapYear year | year > 1752 && year `mod` 400 == 0 = True
                | year > 1752 && year `mod` 100 == 0 = False
                | otherwise = year `mod` 4 == 0
The US converted to the Gregorian calendar in September of 1752. Before that, every fourth year was a leap year. IsLeapYear reflects that. The date was also adjusted, so that the day after 1752 Sep 2 is 1752 Sep 14. date checks for that first, then that the day is in the given month.

Finally, to control the export, the module starts with:
-- A date type that enforces the US calendar restrictions on dates.

module Date (Date, Month(..), date, year, month, day) where
This explicitly exports the Date and Month
types, along with the constructors for Month, the date function described above, and some getters that we haven't discussed. Most notably, it does not export the Date constructor, so that the only way client code can construct a Date is to use the date function.

While I wouldn't recommend this as a real, general-purpose date package, the error handling is simply to raise an exception, which is probably not the best choice, the entire file - including exporting the getters for the date type - can be found in the googlecode repository listed on the right, in the haskell directory.