Custom Search

Tuesday, March 8, 2011

Easy parallel processing in Clojure

I recently ran into a reference to the eight - or n - queens problem again, and it struck me that this would make an excellent tool for poking at Clojure's parallel programming facilities.

The problem
For those not familiar with it, the eight queens problem is to place eight queens on a chess board such that none of them are attacking each other. Since queens attack in eight directions, this means that none of them are in the same row or column, and that none of them share a diagonal. It's interesting, but not incredibly difficult, to do this. The n-queens problem generalizes this to n queens on an nxn board. Solving this problem takes O(n!) time.

Serial solution
Let's start with a non-parallel solution. The heart of solution is the find-boards function:

(defn find-boards
  "Find all solutions for the N-queens problem starting with board up to row."
  [board row]
  (let [cols (count board)
        nr (inc row)
        nbs (filter #(checkboard % row) 
                    (map #(assoc board row %) (range cols)))]
    (if (== nr cols) nbs 
        (mapcat #(find-boards % nr) nbs))))

This routine accepts a representation of a board as a vector of length n. Each element of the vector represents the column the queen in that row is on. An element of -1 means the queen hasn't been placed yet. The second argument to find-boards is the row the routine will try to place a queen on. The queens in earlier rows are assumed to not be attacking each other.

Creating the value bound to nbs is where the hard work happens: it creates a list of boards with a queen in each column of the row being worked on, then filters that through checkboard, which verifies that the queen in row isn't attacking any of the previously placed queens. So nbs has a list of boards with queens placed in non-attacking positions through row.

The body of the let checks to see if this was the last row, and if so returns the  list nbs. If this wasn't the last row, find-boards recursively calls itself on every element of nbs to find solutions for those boards, using mapcat to concatenate the resulting lists together.

A simple wrapper to invoke this constructs an empty board and starts on row 0:

(defn recursive-queens
  "Recursively find solutions for the N-queens problem."
  [n]
  (find-boards (vec (repeat n -1)) 0))

For those following along in the REPL, checkboard is:

(defn attacks
  "Check if queen on row q attacks queen on row r on board."
  [board q r]
  (let [qc (nth board q) rc (nth board r)]
    (or (== q r) (== qc rc) (== qc (+ rc (- r q)))
        (== qc (- rc (- r q))))))

(defn checkboard
  "Check that all queens on board up to row don't attack queen on row."
  [board row]
  (not (some #(attacks board row %) (range row))))

Note that this code hasn't been tuned in any way - this is not a very fast n-queens solver. The point isn't speed - the point is to see how easy it is to cut down wall clock time using Clojure's parallel processing facilities.

For the record, this version solves the 8 queens problem in about 26 milliseconds, and the 12 queens problem in about 25 seconds on a 2.4GHz quad-core processor. This is an early quad-core CPU, with memory more tightly coupled than in current processors.

Everything Parallel
A first attempt at making things parallel is to use pmap in find-boards, leading to find-boards-threading

(defn find-boards-threading
  "Start threads to find solutions for the N-queens problem from our starting point."
  [board row]
  (let [cols (count board)
        nr (inc row)
        nbs (filter #(checkboard % row)
                    (map #(assoc board row %) (range cols)))]
    (if (== nr cols) nbs
        (apply concat (pmap #(find-boards-threading % nr) nbs)))))

The only change is the last line. Instead of using mapcat, it apply's concat to pmapping find-boards-threading recursively over the list of good boards in nbs.

This version solves the 8 queens problem in about 36 milliseconds, and runs out of threads trying to solve anything larger than that. This is actually expected - pmap is documented as suitable for use parallelizing expressions that take significantly longer to compute than the overhead of keeping track of all the threads. With this version, each thread does a near-trivial amount of computation - then launches more threads! So we expect little if any improvement in wall-clock time, and lots of threads - in fact, O(n!) threads.

One level of parallelism
An easy change produces a version that only uses one level of parallelism. Instead of calling find-boards-threading to recurse, call find-boards. So only the first row creates new threads. The rows after that run serially in their thread:

(defn find-boards-threading-1
  "Start threads to find solutions for the N-queens problem from our starting point."
  [board row]
  (let [cols (count board)
        nr (inc row)
        nbs (filter #(checkboard % row)
                    (map #(assoc board row %) (range cols)))]
    (if (== nr cols) nbs
        (apply concat (pmap #(find-boards % nr) nbs)))))

Instant improvement. This version solves the 8 queens problem in under 13 milliseconds, and completes the 12 queens problem about 9 seconds. Still, that's barely twice as fast, even with four processors. I think I can do better:

Dynamic parallelism
Instead of always and only creating threads for just the first row, let's try creating threads if there's lots of work to do for the row being worked on:

(defn find-boards-threading-n
  "Start threads to find solutions for the N-queens problem from our starting point."
  [board row]
  (let [cols (count board)
        nr (inc row)
        nbs (filter #(checkboard % row)
                    (map #(assoc board row %) (range cols)))
        finder (if (< (count nbs) 8) find-boards find-boards-threading-n)]
    (if (== nr cols) nbs
        (apply concat (pmap #(finder % nr) nbs)))))

This version adds a new variable - finder - which is either the non-threading find-boards, or this latest version of find-boards-threading-n, depending on whether or not there are 8 or more boards in nbs. So early rows - with lots of work yet to be done - will be run in parallel, but later ones will run serially in the existing threads.

This version solves the 8 queens problem in just under 13 milliseconds, which is expected, as it should only use parallel processing for the first row, the same as the previous version. On the other hand, it solves the 12 queens problem in about 7 seconds, trimming better than 20% off of the previous solution.

Conclusion
With a couple of hours work, I managed to improve the wall-clock time of this algorithm by better than a factor of three - at least for large enough problems - on a quad-core processor with a poor shared memory architecture. I don't think there's a lot more to be dug out of the algorithm this way. There are lots of things that can be done to speed things up - tweaking the clojure to make sure I'm using java primitives all the way through, and algorithmic tweaks described by Jeff Somers, but that's for another day.

Those of you who grab the sources from Google Code will find both a board printing routine and the traditional backtracking n-queens solver in clojure in queens.clj. The latter's performance is similar to that of the recursive-queens function described here.