Custom Search

Saturday, December 28, 2013

An arduino anemometer

Why make an anemometer?

There are lots of anemometers on the market, with a lot of different features. What they don't have is a configurable display. By coupling a good wind sensor with an android board and a cheap display, the display can be configured to do things that simply aren't possible with any of the off the shelf systems. See my rc blog for more details about this.

Friday, November 15, 2013

Threads vs. processes for parallel processing


Concurrency is a hot topic these days, largely because multi-core systems are the norm rather than the exception, and have been for a while. The thing is, multi-core isn't as much of a win for concurrent operations as it is for parallel processing. While the difference is subtle, it's also crucial for deciding what tools you want to use.

Some definitions

So, what is concurrent operations, parallel processing, and the tools I'm talking about? Or more importantly, given that they are used in a number of different ways in the industry, how am I using them?

Parallel processing

Parallel processing means carrying out multiple computations at the same time. Waiting for external devices is not carrying out a computation, so doing computing while IO is going on isn't as parallel processing - at least not in this document.

Concurrent operations

Concurrent operations means doing multiple things at the same time. Unlike parallel processing, concurrent operations includes doing IO. In fact, doing nothing but IO can qualify. Parallel processing is a form of concurrent operations, but I'm going to exclude parallel processing from the definition of concurrent operations in this document as a matter of convenience.

Thread of execution

A thread of execution is a series of instructions that are to be executed by the CPU in order. The operating system may interrupt the thread and allow other threads of execution to operate between instructions in a thread. This really isn't a term used in the industry, but I wanted a term that covered both threads and processes. I'm going to use it to avoid saying threads and processes or thread or process all the time.


A process is a thread of execution plus an address space. The address space is not shared with other processes. Unix was a revolutionary step forward for processes. Before Unix, processed tended to be expensive and painful to create. Users usually got one process, and running a program would involve the command processor (which was running in the users process) loading the executable into the users address space and then calling the executable as a subroutine. It was possible to do concurrent operations in a process, and even have multiple threads of execution active, but it involved executable images specifically built for the purpose, since loading a standard executable would destroy the one that was running. By making processes easy and cheap to create from standard binaries, Unix allowed the command processor to run in it's own process, meaning that concurrent operations could be done by simply having the command processor not wait for a program to finish before prompting the user for the next command.


A thread is a popular - and very effective - tool for doing concurrent operations. A thread is just a therad of execution that shares its address space with other threads. Threads work very well for concurrent IO, since you can just code the IO normally, and then sync with the main process when it's done. For parallel processing, not so well. I'm going to discuss those problems next.

The problems with threads

Using threads creates a couple of problems that using processes avoids.

Shared everything

Actually, there's only one problem with threads, but it creates a number of issues. The problem is that all your data is shared implicitly. The immediate affect of this is that you have to figure out which of your data objects might be modified in different threads, and do something to make sure that doesn't happen concurrently.

In my experience, the biggest source of concurrency problems in threaded programs is programmers who didn't realize that some object they were modifying could be modified by another thread. I have seen the claim - from people I trust and respect - that a properly designed programming language (meaning not C++ or Java) and programming practices can avoid these problems. I haven't used that language, and find it hard to believe that masses of programmers will all follow proper practices. Maybe if a single entity controlled the language and everyone learned from their examples - which may well be the case for this language.

Elsewhere, functional programming languages that make most data immutable are catching on as a solution, allowing either the language runtime or type system to catch such errors.

Once you've identified those places, you've got to arrange to provide the appropriate interlocks. Language constructs that do this automatically are another hot research topic.


If your program does something completely unexpected, then you may well have no idea what state it's in. The question becomes - how do you deal with this situation? If you are using threads for concurrency, then you have to worry about what the rogue thread may have done to data being processed by another thread, even if it they were logically unconnected - one of the consequences of implicitly sharing everything. To be reasonably sure that everything that might have been corrupted is discarded, you have to discard all the threads in the process.

If you using processes for concurrency, discarding that process provides the same level of assurance, but that's only one execution threads worth of work.

So using threads makes your programs fragile.

No distributed processing

If you're dealing with any significant amount of data, you're going to want to have multiple machines working on it. If your application is built using processes for concurrent operations and sockets for data sharing, this is trivial. All you have to do is add some kind of rendezvous mechanism.

If you're using threads, on the other hand - well, you pretty much have to start over to get any communications between the execution threads at all.

Sharing data between processes

Given that processes don't share an address space, the obvious issue is how do you get data between them. I've actually had proponents of threads claim that you couldn't share data between processes without serializing it (though to be fair, they were probably trolling). While that is one solution - and pretty much required for large problems - there are alternatives, each with their own downside.

The important thing to realize is that any non-trivial program will have multiple types of information to be shared, each of which will have it's own requirements. Using threads provides a single solution you have to use, even if it doesn't fit the requirements for the data at hand. If you're used to that model, you'll need to get past it and think about how you're actually using your data.

Explicit sharing

You can always share memory explicitly between processes. This provides the same level of communications as threads, without the nasty implicit sharing of everything. The downside is that I know of no language that provides language-level support for insuring that everything you need to share winds up in the shared memory segment. This means you are in practice limited to machine primitives composed with arrays and records. In my experience this hasn't been a problem, as most of the cases where you need this level of communications use such data.

Serial communications

You can also serialize the data and write it outside of your process. This is required if you want to distribute the processing of your data to more than one machine. The downside is that - well, compared to shared memory, it's expensive. You have to serialize and deserialize the data, and copy it between the processes in serialized form. If you have lots of communications going on (not the same thing as lots of data - passing one byte back and forth a million times is as bad as passing a megabyte back and forth once), this won't work very well. Unless, of course, you more data than will fit in the address space, in which case you pretty much have to do things this way.

Disk communications

An interesting variant on the idea of serial communications is storing the data on disk. This has the advantage of preventing data loss.

You store each chunk of data in a file in a queue directory. Each worker process scans the queue directory, trying to get an exclusive lock on each file until it succeeds in doing so. It processes the data therein, writing the output to a locked file in a second queue directory. When it's done, it can unlock the output file and then remove the input file. There's no chance of data loss, though you could wind up with duplicated data. Adding a work directory between the two queues reduces that problem, but then requires code to clean up the work directory at startup, moving input files back to the input queue and removing output files.


If you have a data structure small enough to fit in memory that all the processes need read access to, and any writes are not shared between processed, inheriting the data by creating it in the parent and forking to pass it to children works very well.

Configuration data for the application often fits this mold. Having all the configuration data handled by the parent - rather than having it in distinct files and the parent passing along the file name - means the parent can verify things between processes, like that the output of one process is correctly connected to the input of the next. Better yet, it can provide intelligent defaults for some values, and share configuration information between process, which has resulted in reducing configuration information by 99% in at least one case.

The downside is that the communications is one-directional. If you need to communicate a value back to the parent, you'll have to find another method to do so.

Exit status

This is another one-directional channel, and is very narrow - basically a single byte. While it doesn't seem very useful, it's used all the time in Unix systems. In every conditional statement in a shell script, the condition being tested is the exit status of a child process (unless it's a builtin, anyway). Any application that does process-based concurrency needs to deal with these, if for no other reason than to check that the child process actually finished properly.

A real-world example

I was involved in refactoring a credit card payment processing system from a single monolithic, threaded program into a distributed application.

Being distributed was part of the requirements, as one of the goals of the refactoring was to decrypt the financially sensitive data on machines that weren't exposed to the network. So serial communications was used to between hosts on the periphery of the network and the internal hosts with the crypto hardware.

The configuration information on each host was created by the parent process and inherited by the workers.

Each machine had a state table for all open connections to it. This was explicitly shared between the processes, so that the parent could correctly deal with child process failing, so that 1) no transaction was reported as completed until it was actually completed, and 2) no transaction was processed twice. Further, it was shared via a file mmaped to disk, so that these same guarantees could be made if the entire machine failed.


OK, so why not just mix and match threading and processes? Well, there's one last little issue with everything being implicitly shared. In order to fork safely, you need to know things about the state of all the locks - both external and internal - in your program. Getting that information becomes much more difficult if you have locks being used in multiple threads - and when do you use threads without some form of locking?

While not every case where a lock is held by a thread that isn't created in the child and hence won't be freed in the child is a disaster, it is in a lot of them. If an internal lock is locked by another thread when you fork and you need it, you're hosed because you'll never be able to get it. External locks you may or may not be able to get, even if you shouldn't be able to if the parent has them, and freeing them may or may not cause them to be freed for everyone, even if the parent is still using them.

Mixing threads and forks arbitrarily can be done, but - well, it's not at all clear it's less trouble than simply staying with threads for everything. On the other hand, if you limit each process to one or the other - parents never create threads but use forks to create workers, which use threads if they need them, but never fork - then you'll be OK.

Except, of course, if you're building something like a CI system, where what the workers do is fork off processes to build things. In that case, workers should avoid using threads.

Sunday, October 6, 2013

Dynamic typing is doomed


Ok, the title is a bit sensationalist. I don't think dynamic typing is really dead. Different tasks call for different language characterstics, and there will always be a place for dynamically typed languages. However, I believe the decades-long debate about whether dynamic typing or static typing is the better choice for general-purpose computing is pretty much over, and dynamic typing lost.

Sunday, July 14, 2013

Slowing down even more..

I know I haven't been exactly a blogging machine - it's a hobby, not a job, so I do it when I feel it - but things are going to get even worse, for two reasons.

  1. I've launched a new blog about my remote control doings. If you're at all interested in remote control craft - mostly aircraft, but I could talk about hovercraft, cars, balls, etc. - go check it out.
  2. My new contract with FP Complete involves writing what amounts to programming blog entries on a regular basis.  So any ideas for a programming blog first get evaluated for it, as that is a job! Their School of Haskell  is an amazing tool for writing blogs using Haskell, and their IDE product currently in beta test is an equally amazing tool for building and deploying web sites in Haskell.

Saturday, June 1, 2013

iOS as Bloatware

I've just bought my fourth Android phone, but only the second that wasn't an AOSP device. The two non-AOSP devices had bloatware problems that reminded me of the things that caused me to switch from the iPhone to Android phones.

For those not familiar with the term - and IIUC, iPhones don't have bloatware in the traditional sense - bloatware is software added to an Android system by either the manufacturer or the network provider that you can't delete. In that, it resembles the OS software - except it wasn't designed by the same group that did the OS, and seldom integrates well.

Droid 3

My first non-AOSP phone was a Droid 3. I bought it as part of changing carriers. The bloatware on it was worse than useless - it actually broke basic functionality.

The least annoying bloatware was the launcher provided by Motorola. It was slow and clunky - on a high end, dual-core phone! At least I could replace it by installing a third party launcher, which mean that it just took up room in the system ROM.

The annoying one was the task killer. Task killers on Android are a bad idea. I've never used them. People running broken software on early versions of Android may have benefited from them, but that wasn't true of Gingerbread, and Gingerbread had been out for most of a year by the time the Droid 3 came out. As far as I can tell, the only thing this ever did was pop up to interrupt my music or ebook reader when I used them for a long time.

Like most bloatware, it couldn't be removed, and couldn't be turned off. So it occupied room in both ROM and RAM and wasted CPU cycles.

But the music player was even worse. It was a modified version of Google's music player, with the bluetooth and headset controls hardwired to go to it. However, it predated Google's cloud-based music service. Installing the update from Google left me with two music players, with the same name and icon. Confusing. Worse yet, once I got the right player working, if I used a bluetooth or headset "pause" button on it, the wrong one would start playing.

When I asked Verizon for help, they couldn't do anything and suggested I talk to Motorola. Motorola's response was "That's the way it's supposed to be."

As you can expect, I rooted this phone ASAP, just so I could turn those things off. Even so, the bluetooth and headset music controls never worked properly on that phone. I was more than glad to get rid of it, and will be avoiding Motorola phones unless they do an AOSP phone.


This is my third HTC Android phone, all of which have been Ones (the others being a G1 and a Nexus One). The bloatware on it is called "Sense". Sense actually gets good reviews - people seem to like it. I can see why, but it's a waste of space for me.

For instance, it has a clock, calendar and weather widget built into the launcher, car dock and lock screens. OK, cool. I normally don't use a clock widget (there's a clock in the notifications bar), but getting the other two in a built-in widget would make it worthwhile.

Except ... I fly RC aircraft as a hobby. Wind speed is crucial for this - if it's to high, I can't fly! I won't use a weather app that doesn't provide this. And the one provided with Sense doesn't. You can't change the app used by - or even started by - the widget, and you can't remove the widget.

There is an odd thing about this. Newer versions of Android have a feature that let the user disable bloatware - if whoever installed it lets you. The weather app on the HTC One can be disabled, and I did so. Now the widget just complains about not being able to get the weather. Seems like they could do better than that.

Similarly, the car dock application has a music player widget built into it. But the music player can't access Google's cloud music, so there's nothing on it I want to hear. I can't remove this, so the car dock needs to be replaced.

The other interesting part of Sense is "Blinkfeed", which HTC has been advertising heavily. It's a great idea - the latest news from all your news sources available on the lock screen! Except - well, no RSS feed. No email. Those are my primary news feeds. It's either various syndicated news sources or a small set of social networks - not including the one I use (Google Plus).

Fortunately, this is easy to fix. Install a third part launcher (again). Then install Executive Assistant. I've been using EA since the Nexus One. Much like Blinkfeed, it provides the latest information I want - except it actually provides the ones I want! Set it to act as a lock screen, and disable the stock lock screen in the Security settings, and it works better than ever.


So, how does iOS fit into this discussion? After all, Apple maintains strict control over the software on it, and it doesn't (or didn't when I was using it) have bloatware from the manufacturer or the cell provider.

It fits somewhere between the Droid 3 and the HTC One. Do recall that I haven't owned an iPhone in a while, so some of these practices may have changed. Updates are certainly appreciated.

Apple provides a smooth, integrated experience with it's devices. This is great - if it's the experience you want. If not, and it's part of the experience that Apple is particularly proud of, then you can't replace it - so it's indistinguishable from bloatware.

Unlike the Droid 3, none of the Apple software breaks third party software - assuming it's available. Unlike the HTC One, many of the parts can't be fixed by changing settings.

Apple doesn't have Widgets at all, so you can't configure the home screen display to provide information you want that apple doesn't provide. You can't replace the home screen software, so you can't fix things that way. Or even fix a home screen that doesn't provide the features you want. If it's not what you want and you can't replace it, it's bloatware, no matter who it comes from.

One of the things quietly taking over the Android world are "trace" keyboards. These let you enter words into a virtual keyboard without ever taking your finger off the keyboard. You make the same motions you'd make to type out the letters, but leave your finger on the keyboard. The software looks at where your finger changed directions, and then uses a dictionary to guess what word you are entering. Yes, it's not 100% accurate, but it's not really any worse than the iOS keyboard with auto-correction turned on. And it's both easier and faster than trying to type in the letters. They are now so popular that all the general-purpose keyboards I have available provide this functionality.

In fact, the ability to use third party keyboards is a major win in and of itself. Try opening an ssh session to another system and using a modern shell. Characters that are common in that environment range from hard to impossible to use. So I normally install a third party app that provides a keyboard designed for this situation.

Google has gone out of it's way to make the Android system pluggable, allowing users (and, unfortunately, manufacturers and cell providers) to plug in replacements for most functionality. After experiencing this, Apple's approach of locking users into their own software is indistinguishable from bloatware. Right now, none of my devices are rooted, though I've thought about it in order to install Titanium Backup for backup services, because between the built-in functionality and third party replacements, they do almost exactly what I want. My iPhones, on the other hand, never lasted long without being jail-broken. In fact, I'd usually delay OS upgrades until after the jailbreak was available, simply because they never did enough out of the box.

So iOS comes with software I don't like, don't want to use, and can't get rid of. Not as good as the HTC One, in that I have to jailbreak the phone in order to install an alternative. Better than the Droid 3, in that it's not fundamentally broken as is, and generally doesn't break the alternatives.

Sunday, March 17, 2013

Idiomatic C and Arduino

The few of you who read this blog regularly will be wondering about "C". Not Python, not some new-fangled functional language, but good old-fashioned C, the workhorse language of the last 30 years or so. That's because I'm talking Arduino, and C - sorta kinda - is the language of choice for that platform.

For those not familiar with it, Arduino is an open source hardware platform built around the ATMega micro-controller CPUs. The boards - whose design is covered by a Creative Commons Attribution Share-Alike license - provide the micro-controller and support circuitry to make using external devices easy, and have a standardized form factor for adding expansion hardware. There is also an IDE for writing code, for the C/C++ framework to run on the controller. The boards - or clones of them - are available for as little as $15 shipped

These are 8 bit CPUs with memory sizes (flash, eeprom, and ram) measured in K, or maybe 10s of K. There is no way the run time system of a modern language will run on them, though there is a forth system or two available, as well as some custom interpreters. You can use a modern language to control the hardware if you run it on the desktop with some kind of serial link between the devices. But normally you cross-compile a small language - like C or a DSL for hardware systems like Atom - for the thing. C isn't much of a reach - these systems aren't much less powerful than the machines C was developed on, though just enough so that a C compiler probably won't fit.

The project

As mentioned, the Arduino boards are designed to make it easy to use external hardware, and the standard examples start with LEDs. They're cheap, easy to hook up, and hard to screw up.

So, you give a geek a bunch of LEDs that can be controlled by a computer, and the first thing they're going to do is build a binary clock, unless they have a grid of LEDs, in which case they'll play life on them. Because that is, after all cool - unless you're not a geek, in which case it's geeky.

Once I noticed that there were wearable Arduino platforms available, I figured a binary wrist watch was a natural. So I built a prototype:

I'm not going to discuss the code here, but it has been added to the blog repository. What I want to discuss are the implementations of the binary clocks I found.

The code

None of the implementations felt like idiomatic C to me. Possibly the problem is with me - I learned C long enough ago that =+ was a valid operator, and this code is more C++ than C. Possibly it's the way hardware people (which is where Arduino's roots are) approach C. Possibly it's the result of learning programming with relatively modern languages instead of things like C and it's predecessors. But we're going to look at one function: display_value(uint8_t *, uint16_t)

display_value accepts a pointer to an array of pin numbers (Arduino output devices, which for this purpose can be set to either 0 or 1 to turn an LED off or on), and a 16 bit unsigned value to display on those LEDs. A common variation is to use a starting pin number and assume the pins are assigned in order, but this doesn't work well with lots of LEDs, as some pins have special capabilities, and wiring constraints may cause you to want to use pins in strange orders. The array of pin numbers allows them to be assigned to whatever pins are convenient.

One common version of the function (usually just coded in line) looks like so:
display_value(uint8_t *pins, uint16_t value) {

  for (int8_t i = 0; i < 8; i++) {
    digitalWrite(pins[i], value & (1 << i)) ;

The digitalWrite function sets an output pin to the given value. This version is bad on the ATmega processors because they don't have multi-bit shift/rotate operations - you have to do them one bit at a time. So the 1 << i turns into an inner loop. The Arduino framework includes bit twiddling functions that would presumable avoid this, but they don't seem to be in common use, and aren't recommended in the language documentation about >> and <<. In fact, that the processors needs a loop to do variable multi-bit shifts isn't mentioned, either.

The second form avoids that, by using a mask variable that it modifies with each pass through the loop:
display_value(uint8_t *pins, uint16_t value) {

  for (uint8_t bit = 0x8, i = 0; bit; i++, bit >>= 1) {
    digitalWrite(pins[i], value & bit) ;

This version doesn't have the inner loop for the shift, and when disassembled takes about half the instructions of the first version, and a bit less stack space.

The version I wrote as "natural" C changes the interface slightly, using a terminal zero in the array for the last element, then doing pointer arithmetic instead of an explicit index:
display_value(uint8_t *pins, uint16_t value) {

  for (; *pins; pins++, value >>= 1) {
    digitalWrite(*pins, value & 1) ;

When disassembled, this one is slightly shorter and uses slightly less stack than the second version. However, if it were open coded, it would require another variable, or two, depending on whether or not value could be destroyed.

As I said, this feels natural to me. I learned to do array processing in C with pointer arithmetic. I can see that that might no longer be fashionable - after all, it is error-prone. However, that it allows me to avoid introducing two new variables seems to more than make up for that, as those introduce there own errors.

What I'm wondering is if there's some good reason to avoid the third version for Arduino projects, or modern C in general. Disassembling it didn't turn up any. Anyone else have an opinion?