Custom Search

Wednesday, April 27, 2011

Why Haskell?

This should not be considered an expert overview of the language. It's a deep language, and I'm still learning it. This is a discussion of what I've seen so far to explain why I chose it.

As I mentioned, I'm not learning Haskell because I expect it to be a marketable skill; I'm learning Haskell because I expect it to improve my programming skills in general. That should happen because it's radically different from other languages I know, so it's idioms should be new to me, and will thus provide more options when working in other languages. At least one person regrets this - because they wind up thinking about how much simpler things would be in Haskell.

As a final bonus, there appears to be more higher mathematics in use when writing Haskell than in most other languages. So I'm hoping to get a little more use from my mathematics degree than I've gotten so far.

Things I look for in a language

Being well-designed

I've heard it claimed that Haskell is in the LISP family. Looking at a typical program, you'd have to wonder about that - it looks more like Perl, with obscure operators scattered throughout the text. Once I figured out what's really going on, I could see the logic of the claim. It's at least as valid as the claim that Scheme is a descendant of Algol. Outside of the type system (more on that later), there's a bit more syntax than LISP, but not a lot. Most of a programs code consists of function calls and binding the value of those calls to names - just like functional code in LISP. The most prominent difference is that, where LISP has one way to express a function call - (function arg arg arg) - Haskell has two: function arg arg arg (without LISPs parenthesis) and arg `function` arg for functions of two arguments. The tricky part is that a valid symbol quoted as shown is an operator, and an operator enclosed in parenthesis is a function name. So 2 + 3 = (+) 2 3 = 5.  Basically, it's a syntax tweak that lets the programmer decide if something is better used as a function or an operator both at definition time - by choosing a name that's an operator or a symbol - and at use time, by choosing to change the function name from one to the other.

So it's not as chaotic as it seems. And the bit about math plays into it - the design of the various functions and the function application system all seem to have a bit of mathematical logic in them.

High-level data types

Haskell has a nice selection of high-level data types. Unlike most languages I've dealt with - Clojure being the most notable exception - they're all immutable. Since evaluation is lazy (another thing that makes it different from other languages), lists - or streams, if you prefer - feature prominently. Another thing making it a LISP language. But it also has tuples, arrays, and hash-maps - though the latter aren't quite up to the performance levels I'm used to. Something about writing immutable hash maps with both a fast lookup and insert (which creates a new map) being difficult.

Boilerplate minimization

This is part of the static vs. dynamic type checking debate. One argument goes that static type checking catches errors at compile time rather than test time, so shortens development time. The counter-argument is that typing in type information takes time and adds more places for errors, so lengthens development time.

Haskell lands squarely on both sides. The compiler does static type checking at compile time. However, it also infers the types of most things, so you almost never have to provide type information. From what I can tell, type information is provided more often specifically to nail down a complex functions type so the compiler will complain if you get it wrong than because the compiler can't infer a type.

Further, Haskell one ups most dynamic type checking languages when it comes to reducing boilerplate. If not having to provide type information for parameters cuts down development time by removing places you can make mistakes, then not having to provide names for the parameters should be even more effective. There's a Haskell style that encourages doing just that. Instead of defining a function with parameters that returns  the result of the calculations with those parameters, you define a function without parameters that returns a function that does the calculations when passed those parameters.

Support for concurrent programming

Seems to be there as well, in a set of features similar to Clojure. Both languages use immutable data structures and functional techniques to make the bulk of a program concurrency-safe by default. Clojure has some container types that will raise exceptions if used in ways that aren't concurrency-safe. Haskell has a type system that captures the notion of not concurrency-safe, so you get compile-time errors if you try such things accidentally.

The REPL

Yup, the Haskell implementations I've looked at all have a REPL.

Plays well with others

At least one implementation can compile to native code on a number of systems as well. There seem to be wrappers for most of my favorite C/C++ libraries, so the external function support should be quite good.

A note about homoiconicity

Haskell isn't homoiconic. That means it can't have real (i.e. LISP) macros. Which means LISP is more powerful. Right?

Maybe. LISP macros are generally used for one of three things:
  1. Controlling evaluation to create new constructs. That's built into Haskell.
  2. Creating domain specific languages. The ability to make operators functions and vice versa pretty much covers this.
  3. Automated code creation. Um...
Ok, Haskell doesn't have that one. And that is one of the more powerful uses of macros. There is an extension to Haskell (Template Haskell) that covers this. It's not portable. It's poorly documented. It's works an ASTs instead of source code. But it is there.

I'll be blogging about my experiments with Haskell as well, and possibly revisiting some of the things I did in Clojure. Stay tuned...

Tuesday, April 26, 2011

Personal choices in marketability

Having figured out what skills are going to be in demand, the next step is to filter in your personal attributes: Which things you enjoy doing - I assume you're a programmer because you enjoy it, we certainly don't do it for the fame or money. Which things you are good at - in particular, what innate abilities might be required for some choice, and how well you do at those.

My tastes and skills

Probably the single most important factor in what I choose to do is that I can't do design work involving graphics. I've studied the classics in the field, and can critique a design or UI, but my own work is ... well, the further I get from "bland" and "boring", the worse it gets. This blog is about as far as I dare go.

That ties into the things I enjoy doing. While I enjoyed working with web 1.0 technologies, the later versions bother me. JavaScript seems to combine the worst features of Perl and Java. Most of the web frameworks seem to be designed to let designers inject code into a design - which means they play to my weaknesses. I prefer toolkits that are designed for programmers - SeaSide and QtWui come to mind. But those aren't very popular.

If that hasn't make it obvious, I'll state it outright: I'm a picky SOB.  I don't want to work with things I don't enjoy: Windows, Java, C++, Perl. JavaScript I already mentioned not liking, which drags me even further away from doing client side work. I managed to survive the last 35 years with DOS and Windows work being at most minimal parts of my work; I suspect I can survive what's left of my career without having to deal with web client software, and only dealing with server software that I like.

As for what I like: languages that are obviously designed to meet some goal, as opposed to being a mere accumulation of features.  Anything that minimizes the boilerplate in the programming process is good in a language, but not an IDE. I want high-level data types, because those are sufficient for many tasks by themselves. Well done, they can be used for lots of things that aren't necessarily obvious. Finally, I consider a REPL to be one of the best tools for learning a language a beginner could ask for.

The languages I've chosen in the past are Scheme, Python, and Eiffel - and they are all fun. Python is fairly marketable. I like C - at least, the early versions, when it was designed to displace assembler as a systems programming language, as opposed to what it grew into as it was co-opted for ever more things, and that's also marketable, though I'd rather be writing Python.

Finally, there's what I'm looking for in working with concurrent programs. Basically, I want tools that I think of as relating to concurrency issues the way garbage collection relates to memory management. They should hide the details from me and at the same time prevent obvious mistakes. Rather than alloc (lock) and free (unlock) calls, I want high level protection mechanisms that can't deadlock or leave dangling locks. Preferably, a more advanced tool than CSP or variants. And of course, if I use something that needs to have access and/or modification protected without proper protection, I'd like a warning about it - at either run time or compile time.

So what am I studying now?

Well, this blog has chronicled my experiments with clojure. I chose that because it runs on the JVM, which has the potential of letting me write mobile applications, and meets my concurrency requirements. It also exposed me to the Java environment, which is nothing like the Unix environment I'm used to. The major difference is that - well, it's targeted at enterprise applications. Which is also the problem - the amount of boilerplate required to do simple things isn't noticeably less than to do those enterprise applications. I got over it, and created WAR files for a simple app, and got an education.

One of the problems with Scheme (and LISPs in general) is that they don't play well with others. Clojure solved that by integrating tightly with Java. But this makes it inappropriate for use in the Unix world. Writing a quick little hack to automate some task in a JVM-based language doesn't work very well - because the overhead of starting the JVM keeps it from being a quick little hack. People are "solving" this by making the JVM a server process you connect to to handle running your quick little hack. Basically, making that quick little hack part of an enterprise-class system.

With that in mind, I'm now studying Haskell. This is more a "learn it to improve your programming" than something that's directly marketable. I'll discuss why in the next entry.

Saturday, April 23, 2011

Keeping yourself marketable

A recent question on LinkedIn asked about the skill set to be working on if you're hoping to make yourself more marketable. This is something every professional programmer should be thinking about as long as they want to keep doing technical work. I've been thinking about this for a bit, and it ties into what I wanted to write about next in any case.


Before considering what skills you want to be working on, you have to start with what skills are going to be in demand - which means thinking about users are going to want from their computers in the future.


Enter the mobile internet

Unless you've been hiding under a rock for the last few years, you've run into some form of mobile device with internet access - smartphones, netbooks and tablets of various kinds. In particular, Apple (and I'm only an Apple fanboy when compared to Windows fanboys) changed the world with the iPhone. Full screen touchpad devices - with or without keyboards - are everywhere.  It changed the pocket computer (with or without a network connection) from something a small percentage of geeks carried to something almost a quarter of the people in the country have. And they're just getting started, with tablet prices dropping drastically since Apple introduced the iPad, and wireless broadband prices falling.


I've believed since the 90s that the mobile internet was going to be the real growth area for mobile devices. You might never be able to carry enough storage in your pocket to keep a database of, for example, every movie currently playing in the US, but even in the late 90s you could access that information from a cell phone or internet-connected personal digital assistant.


Lately, I've seen examples of what I think people are going want from applications in this environment. Users will want their data to be available no matter what device they're using.  As they move from adding notes on a tablet or laptop in a meeting to major editing on their desktop to a quick fact check on a cell phone while out at lunch, there's no reason to actually have to worry about making sure that everything is in sync - it should just happen. Save will no longer mean put this in local storage, but also imply and copy it to the server, and Open will mean check the server for updates. Variants include classic client/server relationships, where the user edits directly on the server, or saves and opens only go to the server - the user won't really care. In this environment, not only are the devices mobile, but the data is as well.


Technologies for mobile data


The development model for such applications is still being worked on. Obviously, they'll need a network (invariable web) back end no matter what platforms are targeted. The front end can run in a browser - which has the advantage of being portable to many platforms out of the box. With HTML5 and proper CSS, it will even work reasonably well on everything from small mobile devices to the desktop. However, In the brief history of such devices, users have shown a strong preference for native applications over browser-based applications, if for no other reason than mobile devices can't always reach the network. Mobile platforms - unless they're running a lightweight version of a desktop OS - tend to have restrictions on the development environment that vary from platform to platform. Finally, the back end may well run on a cloud server such as Amazon's Elastic Cloud or Google's App Engine, which may have it's own set of restrictions.


The desktop client can be written in whatever technologies are most suitable for the desktop, which will be discussed in the next section.


Mobile applications depend on the platform. The only one I follow closely is Android; I'll expect corrections from readers for the rest. 
For iOS, it's pretty much restricted to Objective C. Symbian is C++. Blackberry is Java. All three of these allow some selection of web tools like JavaScript, Flash and AIR - though which varies from platform to platform. Android is primarily Java-based, though it's open nature means you can develop for the embedded Linux system using anything in the Gnu Compiler Collection. Google supports a variety of scripting languages via the Android Scripting Environment. Finally, that the Android market is open means a number of different programming are available for Android. Many are ports of Java languages.


While EC is unrestricted, AE is restricted to Python and Java. Given Java's popularity in the enterprise computing world, it's unlikely that any cloud server won't allow things running on the JVM.


Exit the clock race


Another relatively recent change is the ending of the chip manufacturers wars to turn out the chip with the highest clock rate. Basically, it got to the point where it was more cost effective to wring more processing out of chips by putting more processors on them than by cranking up the speed. So my old 3.8GHz dual-core Pentium 4 is the fastest (and hottest) popular cpu sold in the US, but slower than many modern 2.xGHz systems. And these days, multi-core chips are the norm for desktops, laptops and tablets, and expected in the generation of smart phones and similar sized devices just being released.


This changes the development world since using the extra speed in these processors means running your code in parallel - which means dealing with concurrency issues. As a result, a number of different tools for dealing with concurrency at a higher levels than locking code segments have emerged. Eventually, these tools will show up in mainstream languages, but there's no harm in getting a head start on them if you can do so.


Summing up


The most valuable people will be able to work on multiple front end platforms - and on the back end as well. Since an applications first implementation is liable to be a pure web application, this makes the web tools - JavaScript and libraries, advanced HTML, and CSS - the tools to study. There isn't much that can be done about that.


Mobile platforms are liable to be the second front end. Those web technologies and Java are the only things that show up on multiple platforms (unless you're writing to the embedded OS under the application layer).


The back end could be any number of things. Ruby on Rails and Python with a number of web frameworks (Django, Pylons, Zope) are popular. Both SQL and the new noSQL databases are frequently used in such applications, and hence worth keeping in mind.  Java is also popular, at least at the enterprise level, where Java's web server technologies are excellent.


So web technologies and Java would seem to be the thing to study. However - Java does not necessarily mean the programming language here, as there are a number of languages that run on the JVM but aren't Java. Most notably, Jython and JRuby are Python and Ruby running the JVM, so those might well be a good combination with a web framework for the language. The other thing to note is that some of these languages - like Clojure and Scala - have interesting concurrency features.


Those who've been following this blog will have read about my experiments with Clojure. Next I'll discuss my preferences, and how that factors into what I choose to do next.

Thursday, April 7, 2011

Almost like a real web app

I took a long weekend over the first, and did the next round of changes to my X10 controller web app. The changes revolve around loading the config information from an SQL database rather than using lists wired into the code, and arranging things so that changed data could be reloaded without restarting the app. A database is normally a critical part of a web app, so this almost makes this a real web app.

Config

The config.clj file has pretty much been rewritten from scratch. It's picked up code from controllers.clj that built the maps used for rendering, etc. The rest of the controllers code has moved into core.clj.

First, the ns changes to pick up clojure.contrib.sql. I chose that SQL module because this is a simple SQL application, so there's no reason to go to the work of finding a more sophisticated solution:
(ns x10.config
  [:use [clojure.set :only (difference)]
        [clojure.contrib.sql
         :only (with-connection with-query-results transaction)]]
  [:require x10.core]
  [:import gnu.io.CommPortIdentifier [com.micheldalal.x10 CM17A]])
This also picks up the set code for later use, the controller types now in x10.core, and the X10 controller and Java IO classes.

The globals have changed to references - so that it can reload the data later - and the one that isn't mutable has changed it's name to use + instead of * for ears (CL uses *'s for globals, Clojure for mutable's):
;; The *devices* ref holds the devices map for rendering and finding
;; controllers.
(def *devices* (ref {}))

;; The *ports* ref holds the map from port names (aka /dev/*) to open
;; port objects
(def *ports* (ref {}))

;;; Map from controller type names to code.
;; Note that the keys here must match the types in the controllers
;; table in the database. There should be one entry for each unique
;; controller module. It maps to a function to produce a new
;; controller given a port and delay.
(def +controller-type-map+
  {"CM17A" (fn [port delay] (x10.core.CM17A-controller. (new CM17A) port delay))})
All the actual database interaction happens in load-database, which recreates - with some tweaks - the lists that were wired into the original code from the database:
;; Load the device database into a set of lists.
;; Since the results are lazy, we have to force them with doall before
;; getting the next set of results.
(defn load-database [db-file]
  (with-connection  {:classname "org.sqlite.JDBC" :subprotocol "sqlite"
                     :subname db-file}
    (transaction
     [(with-query-results controllers ["select * from controllers"]
        (doall controllers))
      (with-query-results names ["select * from names order by name"]
        (doall names))
      (with-query-results groups ["select * from groups order by name"]
        (doall groups))
      (with-query-results ports ["select distinct port from controllers"]
        (doall ports))
      (with-query-results codes ["select * from code"] (doall codes))])))
Mostly, this just loads tables into lists of maps, and returns a vector of those. The  things to note are that it collects the port names from the controllers table, and that the four queries are wrapped in a transaction to insure consistent values in the tables (assuming anything updating the database does the same). And, as noted in the comments, the lists are lazy, so they have to be consumed before running another query, which the doall takes care of that.

The last step is to translate those lists of dictionaries into the map that is going to be stored in *devices*. That is straightforward, if a bit long. The let in load-devices takes care of it, step at a time:
;; Load the device database in db-file into the *devices* map.
(defn load-devices [db-file]
  (let [[controllers names groups ports codes]
         (load-database db-file)
        make-map
         (fn [f list] (apply sorted-map (mapcat f list)))
        controller-map
         (make-map (fn [{:keys [name module port delay]}]
                       [name (agent ((+controller-type-map+ module)
                                      port delay
                                     :error-mode :continue)])
                   controllers)
        name-map
         (make-map (fn [{:keys [name controller code unit]}]
                       [name (x10.core.Module.
                               (controller-map controller)
                               name code unit (atom nil))])
                    names)
        group-map
         (make-map (fn [[name group]]
                       [name (x10.core.Group. 
                               name (map name-map group))])
                   (map (fn [[key vals]] [key (map :device vals)])
                            (group-by :name groups)))
        code-map
         (conj (make-map (fn [{:keys [name command]}]
                             [name (x10.core.Command.
                                     name command)])
                             codes)
                {"Reload" (x10.core.Command.
                            "Reload" (str "(x10.config/load-devices \""
                                          db-file "\")"))})]
    (doall
     (map #(.close %) 
          (dosync
           (let [[ports-map closing] (make-ports-map (map :port ports)
                                                     (ensure *ports*))]
             (doall (map #(x10.core/set-port @% ports-map)
                         (vals controller-map)))
             (ref-set *ports* ports-map)
             (ref-set *devices* {"Devices" name-map
                                 "Groups" group-map
                                 "Code" code-map})
             closing)))))) 
It loads the lists with load-database, and then uses make-map to transform each dictionary into the thing actually used in the code. The controllers list gets turned into controller objects, and that is then used to attach controllers to the devices created from the names list, which are then used to create groups of devices that can be controlled at once. The groups list needs to be tweaked, as it's maps of name/device pairs, and is turned into a list of name/list of devices before being passed to make-map. Finally, the code table is used to create code entries - a new feature - with a Reload entry that runs the load-devices function again to reload the database. After all that is done, the function starts a Clojure transaction with dosync, and in the body of that calls make-ports-map to create the new set of ports being used and a list of those to be closed, adds the ports to the devices in controller-map, and changes the two ref objects. The transaction returns the list of ports to be closed, which has .close map'ed over it to actually close them.


Note that everything done inside the dosync can safely be run more than once. It may attach a port to a controller more than once, but the second one is a nop.


Nothing this does depends on the old value of *devices*, so changes to it won't change what happens here. The set of ports used doesn't depend on that old value, so this always attaches the same set of ports to controllers. The set of ports closed depends on the old value of *ports*, since a port is only closed if it was in the old value. If another thread changes the value of *ports* to no longer include some port, then that thread will be responsible for closing that port.


Plumbing
These changes had remarkably little impact on the rest of the application. Things that used to reference *devices* needed to use @*devices*. The page rendering code picked up the new page of options automatically. The other changes to handle this were mostly plumbing changes.


The most significant ones change to the Controller protocol and type: instead of having open/close, it has set-port (mandatory) and get-port (a debugging aid):
;;; Bottom level abstraction - an X10 controller object.
;; Controller "devices" set other devices to a state (currently true
;; for on and false for off). They actually go out and *do*
;; things. Must be eating their Powdermilk Biscuits.
(defprotocol Controller
  "X10 Controller Modules"
  (set-device [this code unit state]
    "Set the device at address code/unit to state")
  (set-port [this portmap] "Set the com port to use")
  (get-port [this] "Return the port this controller is attached to."))

;;; To add a new controller type
;; Write the appropriate defrecored that instantiates the Controller
;; protocol, then add an entry to the config.+controller-type-map+ to
;; map the name used in the config database to the record.
(deftype CM17A-controller [^CM17A controller port delay]
  Controller
  (set-device [this code unit new-state]
    (.setState controller (first code) unit new-state)
    (Thread/sleep delay)
    this)       ; we send set-device, so return ourselves
  (set-port [this portmap]
    (.setSerialPort controller (portmap port)))
  (get-port [this]
    (.getSerialPort controller)))
set-port is a bit odd, in that it takes the map from port names to open ports to look up the open port object to use. Other than that, this is straightforward.

There's also the new Command record, which uses read-string to translate the string from the database into Clojure code that it then eval's when turned on:
;;; A command is clojure code that we run it's turned on.
;; This also keeps a state to make the renderers happy.
;; We use eval here, instead of at load time, so any symbols get resolved here
;; instead of in config. Not sure why....
(defrecord Command [name value]
  X10
  (set-state [this new-state]
    (when new-state
      (eval (read-string value))
      (str "Evaluated " name "."))))
The war.clj file was changed to include the new init code, as well as some shutdown code:
(ns x10.war
  [:use
 [x10.web :only (handler)]
        [x10.config :only (load-devices close-ports)]
        [ring.middleware.stacktrace :only (wrap-stacktrace)]
        [ring.middleware.file :only (wrap-file)]
        [ring.util.servlet :only (defservice)]]
  (:gen-class :extends javax.servlet.http.HttpServlet
              :exposes-methods {init initSuper}))
  
(defn -init
  ([this config]
     (. this initSuper config)
     (let [db (.getInitParameter this "device-db")]
       (load-devices db)
       (.log this (str "Setting up *devices* from " db))))
  ([this]))     ; because the super config will eventually try and call this.

(defn -destroy [this]
  (close-ports)
  (shutdown-agents))

(def app (wrap-stacktrace #'handler))
(defservice app)
The -init function takes some explaining. The war framework will try and invoke -init both with and without an argument, from two different ancestors. If you provide one, Clojure will fail to find the other when that invocation happens. Further, the version with an argument needs to invoke the superclass method of the same name, so the :exposes-method keyword in the ns macro is used to make that available with the name initSuper.

Finally, the web.xml file has additions to run the init code at startup and provide the database file name as an argument:
<web-app>
  <!-- Servlet class taken from first :aot namespace -->
  <servlet>
     <servlet-name>x10</servlet-name>
     <servlet-class>x10.war</servlet-class>
     <load-on-startup>1</load-on-startup>
     <init-param>
       <param-name>device-db</param-name>
       <param-value>/usr/local/etc/x10.db</param-value>
     </init-param>
  </servlet>
  <servlet>
    <servlet-name>default</servlet-name>
    <servlet-class>org.mortbay.jetty.servlet.DefaultServlet</servlet-class>
  </servlet>
  <!-- Servlet is mapped to / by default  -->
  <servlet-mapping>
    <servlet-name>default</servlet-name>
    <url-pattern>/help.html</url-pattern>
  </servlet-mapping>
  <servlet-mapping>
     <servlet-name>x10</servlet-name>
     <url-pattern>/*</url-pattern>
  </servlet-mapping>
</web-app>
Future work

To turn this into a real web app, it needs an interface to allow editing the various controllers, groups, etc. in a web interface. Since these change rarely - a few times a year is typical - and I'm happy with an SQL prompt, I'm not likely to do that. Adding a DSL for macros and variables is probably next.

As usual, the code can be downloaded from the google code repository.