Custom Search

Friday, April 20, 2012

Processes - a modern approach to concurrency

I seem to keep running into an assumption that the only solution for concurrent operations is having multiple threads of control in a shared address space - aka "threading". This is in spite of this model having the rather nasty downside of exposing everything in memory to shared access and modification, making it very easy to inadvertently write ill-behaved code. It's as easy as linking with a standard library instead of the special "thread-safe" version of that same library - assuming there is a "thread-safe" version of the library!

Letting concurrent operations share address space is an old model. There is a modern alternative that people either dismiss or aren't aware of that doesn't suffer from the problems mentioned above. Unfortunately, more primitive operating systems don't support it very well, so languages and libraries that want to be cross-platform tend to use or recommend the more primitive threading model. I suspect this is why the assumption of it being the only solution is so widespread. I'm going to write this as if you didn't know about the alternative, even though it's very likely most of you do.

Some history

The shared address space model used by threads is very old. Early computers didn't support multiple address spaces, so it was forced on them. A typical system would divide the address space into parts, some for use by the system and some for use by user programs, and you could only have one normal user program running at a time - because there was only one address space.

Of course, there are user programs that you want to always be running, or to run concurrently with some other programs. Such programs could be written to run in the existing shared address space, but had to be written specifically to do so. There were lots of issues with doing this, so the less said about this form of concurrency, the better.

When hardware support for multiple address spaces was first introduced, systems still had address spaces shared between the user and system, and concurrent operations were still painful. It was mostly used to support simultaneous users, rather than concurrent operations in a single application. If you wanted that, you still had no choice but to write code specifically intended to share a single address space. It might be possible to create a new address space, but that was hideously expensive and convoluted, and communications between them was painful at best.

As an aside, some systems did make things a bit easier - they insisted that user programs be position independent code, so they could be loaded anywhere in the shared address space. This meant that concurrent operations didn't require special code, as all user programs were special.

The breakthrough idea is putting each user program in it's own address space, and making communications between them cheap and easy. To be practical, this meant that creating a new address space also had to be cheap and easy, at least compared to doing the same thing on earlier systems. The resulting things are known as processes.

The cost of processes for concurrency

While non-shared address spaces for the concurrent processing does solve the problems that come from everything being shared, they do come with some extra costs. Those aren't as great as they appear at first glance.

Creation costs

In a modern system, a new process is created by copying the parent. This involves the cost of creating a new thread of control - which has to be paid if the address space is shared or not. It also involves creating a copy of the descriptors for the data space, making the new ones copy on write. This isn't very expensive, as it all happens in memory. The cost will go up as the child writes to non-shared memory, since the system will automatically copy those parts of memory. This is probably more expensive than properly locking all access to that memory, but there is no way to create a locking bug. This is similar to garbage collection, where you pay with performance for having something else pick up unused memory, but in return you eliminate the possibility of bugs that use freed memory. This is generally considered a good trade off.

Memory costs

The threading model uses minimal extra memory. It just needs to allocate a new stack for each thread.

The process model doesn't create a new stack, but will create more stack as needed for each process, and copy the old data in the new process as it gets reused. More importantly, as objects get written to, it will create copies of the pages being written. This is an unavoidable cost of not sharing those objects.

For many applications, the latter isn't really a problem. Code won't be written to, and static data structures shouldn't be written to. However, if the language uses reference counting as part of it's garbage collection strategy, that could cause lots of writes that aren't really needed. On the other hand, every one of those writes represents a lock acquire and release in the threading model. So even in this case, it's the classic memory vs. performance trade off.

Data Sharing costs

The trade offs here are different than with threads. In some cases, processes have a lower cost. In other cases, they have a greater cost.

The base cost in the threading model is that every object that is written to by multiple threads needs to be identified, and either moved into thread-local storage (if it shouldn't be shared) or wrapped with the proper locks (if it should). Assuming you get all the locking correct (which, in my experience, is highly unlikely), every time you touch such data, it costs a little extra. You may wind up paying a lot if there's contention.

In the process model, data can be shared in three different ways. First, anything created in the parent process will be available to the child. Alterations made by the child won't be visible to the parent, but this model handles a surprising number of cases. In particular, configuration, user preference, and similar things all work well this way.

Second, data can be communicated over an external channel of some sort. For small bits of memory, they just get serialized and passed back and forth in memory. For data that needs to be passed in one direction, this works well. This model of communicating sequential processes is very popular, even in systems that use threading for concurrency.

If the data is very large - so that it won't reasonably fit in memory - then it winds up stored on disk. In this case, the costs are the same whether you have a shared address space or not, as the communication will generally happen via disk. You can even get the effect of sharing the OS object for the open file if the parent creates the object before creating the child - at least in an OS that properly supports this model of concurrency. That's not usually desirable.

Finally, the processes can explicitly arrange to share a chunk of memory. This will need locks just like the threading model. It may also require that the objects stored in it be position independent, unless the processes can negotiate an address for the memory. The major downside here is that support for such sharing in most high-level language isn't very good - at least not yet. It does work very well for arrays of machine primitive types, which is a common in cases that have lots of shared data.

Another benefit

Actually, another major downside of a number of these models is that they can't work between processes that aren't on the same machine. This means that if you need more concurrency than one machine can comfortable handle, you lose. If you're building a truly scaleable system, you'll chose one of the methods that can work between machines. But the threading model is unusable in this situation.

Summary

My point isn't that processes are the solution for concurrency. My point is that they are a perfectly acceptable solution for many problems needing concurrency. They are a more modern approach than the shared address space model of threads, and like many such things (garbage collection, run-time type checking, lazy evaluation, STM, etc.) they make the programmers job easier at some cost in performance and flexibility.

My point is that, if you automatically start using threads when you need concurrent operations, you're ignoring an important advance in software development. If you assume that processes can't meet your performance requirement without measuring it, you've committed` the error of premature optimization.

So next time you need concurrent operations, consider using processes before oping the can of worms that threading represents.