Writing good multi-threaded code is a skill in itself. There are several concepts one needs to be familiar with before starting out to write a multi-threaded program. I will briefly explain some of these concepts with a focus on C++.
Multi-threading was added into the Standard Library in C++ 11. There are several other thread libraries in C++ that existed even before the introduction of the standard thread library. Out of them, Boost threads and Posix threads are closest to the standard thread library. In this article and in the subsequent ones, I will be entirely focusing on the standard thread library.
Process vs Thread
When talking about threads, it is important to know the difference between a thread and a process. A process is an instance of a running program that has its own Virtual Address Space. A process may consist of one or more threads. Each thread within a process gets its own call stack but the rest of the Virtual Address Space is shared among the threads. Note that it is still possible for these threads to access the stack spaces of each other, if needed. If we write a normal program (without any multi-threaded code), it is essentially single threaded process that runs on a logical CPU.
A Linux process spawns child processes (sub-processes) via fork(), clone(), and exec() sys calls. fork() creates a child process that is an exact copy of the parent process. Once the child process is created, exec() can be used to replace the contents of the child process with a new program binary. clone() gives finer control over what states need to be shared between the two processes, and thus provides a capable interface to implement threads.
The parent process must be around to catch the exit code returned by the child process. If the parent process exits before the child, the child becomes an Orphan process and the ‘init’ process is assigned as its parent. When a child process exits, it needs to wait until the parent process collects its return code. A child process in this state is called a Zombie!
Notice that I used the term ‘logical CPU’ earlier. A logical CPU is an abstraction over the physical CPU cores. It is the number of physical cores times the number of threads than can run on each core (when hyper-threading is enabled). Each logical CPU can run a separate thread. There is a nifty command in C++ within the thread library called hardware_concurrency() that will give us the number of logical CPUs in the system. It is also possible to get this number from the command-line in Linux using commands like cat /proc/cpuinfo and lscpu.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <iostream> | |
#include <thread> | |
using namespace std; | |
int main() { | |
cout<<thread::hardware_concurrency(); | |
return 0; | |
} |
Optimal number of threads
The performance of a multi-threaded application is heavily dependent on the number of concurrent threads running on the CPU. There could be several OS created background threads running in parallel with the user created threads. More the number of concurrent threads, more context-switching OS will have to perform. Context switching incurs an overhead.
There is no hard and fast rule regarding the optimal number of concurrent threads used by a process. It is always good to see if the process is CPU-bound or IO-bound. A CPU-bound process is one that is bound by the CPU. It performs CPU intense tasks and as a result, it is the performance of the CPU that determines the performance of the process. A IO-bound process, on the other hand, is a process bound by the IO-operations. It could be the network latency, disk read-write speed that determines the performance of the process.
A good rule of thumb is to have a maximum number of threads as the number of logical CPUs in the system for a CPU-bound process. For an IO-bound process, having more threads than the number of logical CPUs could be beneficial up to a certain point. Bench-marking the application with different number of threads is always the ideal way to figure out the optimal number of threads to be used.
Disabling/ enabling hyper-threading can also vary the performance of the application. It can be enabled/disabled in the BIOS.