Thinking in Parallel
While working on and maintaining SocketCluster, I've noticed that many software developers tend to encounter difficulties when trying to build scalable systems. Maybe this is because, often, the simplest and most obvious solutions tend to not scale; and that might explain why they’re sometimes referred to as ‘naive’ solutions.
In any case, writing scalable code is not that difficult once you understand certain fundamental concepts and guidelines — One of the most important of these is Amdahl’s law. You don’t really need to understand the math behind it but it’s crucial that you understand its central idea that “programs are limited by their serial parts” — and to do this, you have to understand the distinction between ‘parallel code’ and ‘serial code’.
To understand parallel code, first, you have to understand what non-parallel (serial) code is. As a rule of thumb, serial code tends to make the following assumptions:
- That it needs to have access to all of the data about an entire problem in order to solve that problem.
- That it needs to have solved the entire problem by the time it completes execution.
While some problems are inherently serial, it turns out that a lot of problems can be parallelized. Problems whose solutions can be entirely parallelized are sometimes referred to as “Embarrassingly parallel”.
Embarrassingly parallel code can scale indefinitely; serial code can’t and eventually needs to be rewritten; because of this, some would say that serial code is poorly designed — That’s the reason why many developers get into functional programming; languages like Haskell, Erlang and Scala encourage you to write parallel code (by avoiding state-changing and mutable data).
Unfortunately, choosing a specific programming language is not enough; to write parallel code, you also need to change your thinking a little bit. If you recall the points mentioned earlier about serial code, you can infer that parallel code requires you to make the following assumptions:
- That it only needs to deal with a subset of the data about the overall problem that it’s trying to solve.
- That it only needs to have solved a subset of the overall problem by the time it completes execution.
In a client-server/networking context, the general problem is typically that we want to process requests from a (sometimes large) number of concurrent users (usually to concurrently read and/or modify significant amounts of our system’s data). These kinds of problems typically involve many distinct (non-overlapping) data sets and so they can easily be parallelized.
Thanks to the magic of hashing, indexing and sharding (both in the database and as part of the application’s core logic); the processing of each request can be isolated completely from each other — This is the key to writing embarrassingly parallel code; make sure that your code only ever deals with a limited subset of your whole data at any given time.
To summarize; the key to thinking in parallel is to think about your data as being made up of small, fixed-size subsets of data that can be dealt with independently.
To give you a practical example related to my open source work (with real-time systems), consider the following case:
We want to build a real-time system that is made up of two separate processes — Let’s call them ProcessA and ProcessB. Let’s assume that we have two users (Alice and Bob) connected to our system; Alice is connected to ProcessA and Bob is connected to ProcessB. What we want to do is to allow Alice to send a message to Bob. So the problem looks like this:
A naive, non-parallel (serial) solution to this problem would be for ProcessA to simply ask ProcessB if it is connected to the user ‘Bob’ and if so, ProcessA could pass the message to ProcessB which would then send it to Bob. So:
This ‘naive’ serial solution actually works pretty well when we only have two users and two processes in our system, but what if our overall problem looked more like this:
As you can imagine, it’s not feasible for processA to communicate/coordinate with every other process every time a message passes through the system; this would mean that all of our processes would each end up operating on the same data (in this case; every message) over and over again.
A simple parallel algorithm which solves this problem is pub/sub — There are many ways to implement it, but the core principle behind it is to partition messages across different channels and to keep track of subscribers for each of those channels. A typical implementation of pub/sub would involve sharding channels across multiple processes (so that each process handles a subset of all possible channel names). So, to send a message between two or more users, you would simply publish the message to an appropriate channel by handing it off to the appropriate process (you can hash the channel name to determine which process is responsible for that channel) and then that process will forward the message to the relevant subscribers.
With pub/sub, you could set up channels for specific users by incorporating the user’s name (or id) into the channel name (e.g. ‘alice-channel’, ‘bob-channel’) and then only allowing users to subscribe to their own channels. The ‘embarassingly parallel’ idea behind pub/sub is that each process will only deal with a fixed subset of all the channels and messages which pass through the system. The more processes you add, the fewer channels/messages each process will need to handle — There is no limit to how many channels your system can handle; so long as you can keep adding more processes (and more hardware for these processes to run on).
Aside from resource limits, the only real limit of this pub/sub implementation is the message throughput on a per-channel basis; but this is usually an acceptable limit. Since messages are usually meant for humans; and each human can only consume so many messages per minute — In effect, the practical limits of pub/sub tend to exceed the natural limits required for good User Experience — That said, for high-throughput (backend/big-data) applications, you can always shard pub/sub channels into smaller subsets like ‘alice-channel-1’, ‘alice-channel-2’, …, ‘alice-channel-n’ without having to make any changes to the pub/sub algorithm.
Ultimately, though, the best way to understand parallel code is to start writing it! I just hope that this article can give you an incentive to try it out for yourself.