C++ – Memory Alignment

I was once asked in an interview about the importance of memory alignment. I had absolutely no clue as to what it meant or why it was important in high performance/low latency programming.

I have been trying to read up on this topic ever since and the more I read, the more I have been able to understand the importance of knowing the hardware in order to write fast code. I am still a noob in this area and therefore everything you read below needs be taken with a pinch of salt.

So why is it important for a Software Engineer to know the hardware? Well, hardware is not just a single monolithic piece that directly executes the applications that we develop. It is an ensemble of different components that work in tandem with the operating system. Not all components give their best performance in this setting. But it is possible for a Software Engineer to make them perform better by writing code that extracts maximum performance out of these components. Knowing how RAM works, how CPU caches are designed, how a process really gets executed in the CPU etc go a long way in writing highly performant code. Please read this article to get a better understanding of the life-cycle of a C++ program.

While processing instructions, CPU fetches bytes from physical memory in a chunk called cache-line. Even if the processor requires only a subset of these bytes of data, it will still fetch an entire cache-line from the RAM. A cache-line can be thought of as the same as a RAM page-frame/block. A RAM page-frame/block is typically 32 bits in a 32 bit OS and 64 bits in a 64 bit OS.

Consider the following structs. Try to guess its size in a 32 bit system.

Example 1

struct Student{
    char sex;
    int id;
    char type;
    int age;
};

If you guessed 1 + 4 + 1 +4 = 10 bytes, you are wrong. The correct answer is 16 bytes. Now, lets re-arrange the variables of the above struct and again check the size.

Example 2

struct Student{
    int id;
    int age;
    char sex;
    char type;
};

Now the size is only 12 bytes. What do you think happened in the above two cases?Why is the answer not 10 bytes? The reason for this behavior is nothing but the memory alignment performed by the compiler. Every object has a property called alignment requirement. Compiler, by default, performs padding on objects to satisfy this requirement. Compiler does this to take maximum advantage of the underlying hardware. As the smallest unit of data that the CPU fetches from main-memory is a cache-line, it makes sense to align data in main-memory considering the cache-line size (which is typically same as main-memory frame/word size). For the processor, fetching data from main-memory requires several CPU cycles. If an int type is not aligned in memory, it could be spread across two cache-lines in main-memory. Therefore, a 32 bit CPU will have to fetch two cache-lines to get its value from main-memory instead of one.

As mentioned earlier, RAM frame/block size in a 32 bit OS is 32 bits. Therefore a frame/block can be thought of as an array of size 32 and each index in the array stores a byte of data. The process of aligning objects/variables in memory is called as padding. This is how the compiler aligns the above two structs in memory. Each memory block/frame is 4 bytes.

Example 1

             sex  -   -   -
Frame 1 -     0   1   2   3
             id  id  id  id
Frame 2 -     0   1   2   3
             type -   -   -
Frame 3 -     0   1   2   3
             age age age age
Frame 4 -     0   1   2   3

Example 2

             id  id  id  id
Frame 1 -     0   1   2   3
             age age age age
Frame 2 -     0   1   2   3
             sex type -   -
Frame 3 -     0   1   2   3

Please note that compilers align virtual memory (not physical memory) and this virtual memory comes in pages that are then mapped in an aligned manner to physical page frames/blocks when the program is loaded by the OS loader.

Compilers follow the below memory alignment rule while padding,
Objects of size 1 byte go to any address.
Objects of size 2 bytes go to addresses that are multiples of 2 (0,2,4…)
Objects of size 4 bytes go to addresses that are multiples of 4 (0,4,8…)
Objects of size 8 bytes go to addresses that are multiples of 8 (0,8,16…)

It is possible to tell the compilers not to perform padding. If compilers do not perform padding, they perform packing, i.e, they pack all the objects in memory without worrying about the alignment requirements. For example, if you run the below code in Visual C++, you would see the output as 10 bytes.

Starting from C++ 11, It is also possible to provide custom alignment to basic types and user defined types. alignas specifier can be used to provide the alignment and alignof can be used to obtain the alignment of a variable or a specified type.

Most compilers today take care of memory alignment themselves. Knowing how memory alignment works is not a pre-requisite to be a good Software Engineer. But if you are that kind of a person who wants to squeeze out every last bit of performance from your code, knowing the hardware is a necessity. I would suggest reading this article that explains everything there is to know in great detail.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s