Writing good multi-threaded code is a skill in itself. There are several concepts one needs to be familiar with before starting out to write a multi-threaded program. I will briefly explain some of these concepts with a focus on C++.
Multi-threading was added into the Standard Library in C++ 11. There are several other thread libraries in C++ that existed even before the introduction of the standard thread library. Out of them, Boost threads and Posix threads are closest to the standard thread library. In this article and in the subsequent ones, I will be entirely focusing on the standard thread library.
Process vs Thread
When talking about threads, it is important to know the difference between a thread and a process. A process is just an instance of a running program that has its own Virtual Address Space. A process may consist of one or more threads. If we write a normal program (without any multi-threaded code), it is essentially single threaded process that runs on a logical cpu.
Notice that I used the term ‘logical cpu’ in the last paragraph. A logical cpu is an abstraction over the physical cpu cores. It is the number of physical cores times the number of threads than can run on each core (when hyper-threading is enabled). Each logical cpu can run a separate thread. There is a nifty command in C++ within the thread library called hardware_concurrency() that will give us the number of logical cpus in the system. It is also possible to get this number from the command-line in Linux using commands like cat /proc/cpuinfo and lscpu.
Each thread within a process gets its own call stack. But the rest of the Virtual Address Space is shared between all the threads. But it is possible for threads to access the stack spaces of each other.
Optimal number of threads
The performance of a multi-threaded application is heavily dependent on the number of concurrent threads running on the cpu. There could be several OS created background threads running in parallel with the user created threads. More the number of concurrent threads, more context-switching OS will have to perform. Context switching incurs an overhead.
There is no hard and fast rule regarding the optimal number of concurrent threads used by a process. It is always good to see if the process is cpu-bound or io-bound. A cpu-bound process is one that is bound by the cpu. It performs cpu intense tasks and as a result, it is the performance of the cpu that determines the performance of the process. A io-bound process, on the other hand, is a process bound by the io-operations. It could be the network latency, disk read-write speed that determines the performance of the process.
A good rule of thumb is to have a maximum number of threads as the number of logical cpus in the system for a cpu-bound process. For an io-bound process, having more threads than the number of logical cpus could be beneficial up to a certain point. Bench-marking the application with different number of threads is always the ideal way to figure out the optimal number of threads to be used.
Disabling/ enabling hyper-threading can also vary the performance of the application. It can be enabled/disabled in the BIOS.