Transition to 5-Level Paging

In one of my recent posts I introduced how paging works in modern operating systems and used X86 architecture for the explanation. It’s been a year since both OS and hardware designers, started to bring forward the topic of 5-level paging. Paging was supposed to transform to 5-level form in near future and finally a few days ago the new kernel version (4.14-rc1) released with the official (and allegedly stable) support of 5-level paging. The reason that this changed got scheduled to transpire was that with previous processors the physical memory range that the CPU could index was at most 64 TB. One may think 64 terabytes of RAM is more than enough for today’s computers, but apparently technology tends to develop faster than many people expect. To be honest I personally don’t know any computer and even server anywhere that uses this amount of memory, but note that not computer systems are always like what you recognize as conventional computers. Today’s search engines, data centers, social networks and many other complex infrastructures may need to combine and use quite huge amounts of memory beyond many people’s imagination. And if this is the case, the processor in such a situation needs to be able to address that memory. Intel processors (and similarly other architectures) used to utilize 46 pins for their address bus. With 46 bits you can at most address 2^46=64 TB of memory. In normal cases the processor is designed to use less pins than this. Many commercial processors that regular users have on their motherboards have 39-pins address bus. That meas the CPU can address up to 2^39=512GB bytes of physical RAM. You can see the length of the address bus of your processor by the help of different third party bench marking tools, and in Linux by reading the /proc/procinfo pseudo file. Intel published a white paper early this year and illustrated the hardware features and programmer guidelines for new 5-level paging support in the next generation of its processors. If you already read my previous posts you might probably remember how 4-level paging works in Intel processors. We had four 9-bit-wide indexes (totally 36 bits) for selecting page table entries and the physical page in RAM and a 12-bit offset to address the selected page (so entirely we used 36+12=48 bits for our linear addresses). In 5-level paging, another 9-bit index is used for a fifth page table, effectively extending the virtual (linear) address length to 48+9=57 bits. The new table is called PML5 and its physical base address is – as expected – stored in CR3 register. As the Intel manual explains:

CPUID.80000008H:EAX[bits 7:0] enumerates the maximum physical-address width supported by 
the processor. Processors that support Intel 64 architecture have enumerated at most 46 
for this value.Processors that support 5-level paging are expected to enumerate higher values, 
up to 52.

the new processors can use 52-bit address lines and thus index 2^52=4PB bytes of memory. This is equal to 4000 Gigabytes of physical RAM. The format of PML5 entry is very much similar to the PML4 entries. In fact moving from 4-level paging to 5-level is only a matter of expansion of the maximum amount of memory the system can use. With 5-level paging we can have a 128 PB of virtual and 4 PB of physical address space. Of course to comply with the new hardware changes, the operating systems’ way of handling page tables need to change as well to be able to leverage this new memory expansion capability. The new versions of kernel are different from the earlier ones (including the kernel that I based my 4-level paging explanation on) only in that they manage one more page table but in the exact manner that they did before while in 4-level paging mode. So technically not much is changed in the memory management policies of the kernel. It’s just that from now on, we have an operating system that can handle huge memories as large as 4000 gigabytes!
Since kernel is always designed with a backward compatible stance, the processors with 4-level paging support can still be used with this 5-level scheme by the help of a relatively simple hack called 5level fixup. We had the same hack in earlier kernels named 4level fixup for the processors that supported at most 3 levels for paging. The idea is to define some macros that change some code at the time of compilation to fit with the specific hardware model and requirements for which the kernel is compiled. For example substituting pud_alloc(...) with pgd all over the kernel at compile time and stuff like this.

Okay, this was a short description of the recent change that occurred in Intel processors that was reflected in the latest version of Linux kernel. You can obtain more information by referring to the related official Intel documentations.


Leave a Reply

Your email address will not be published. Required fields are marked *