Paging and Segmentation on x86-64 Architectures

 

In all modern architectures paging and segmentation support is available though segmentation is not actually used like the old days. The reason is paging alone tackles all responsibilities that otherwise belong to segmentation and it shows more flexibility and more effective management over memory. In this post I focus on Intel 64 architecture (AKA Intel 64 bit processors) as it’s quite ubiquitous nowadays. Also Linux kernel data structures to manage the design is introduced. Each architecture and even each mode of execution of the processor has its own way of implementing these two memory management techniques, for example different operation modes of Intel behave differently when translating virtual addresses. What I’m going to explain pertains to Intel 64 architecture and its IA-32e operation mode. This is the mode the processor operates in, when a 64 bit operating system is running. But the concept of memory management is similar for other architectures. Just the implementation differs. In Intel 64 bit processors there is also a “Legacy Protected” and a “Real” mode which are used for backward compatibility. Remember “protected mode” is not what the CPU is operating in while executing 64 bit code. That’s instead used to run 32 bit code in compatibility mode.

Before explaining the details of address translation you need to understand what some special terms mean:
Logical address: This is the address requested by the running code to be accessed. This is in fact what a memory address means from executing code point of view. For example when your code wants to access a word in memory at address 0x3464f3, this is a logical address.
Linear address: Address translation is done in two phases. The first one is after resolving an appropriate segment and the second is after consulting the page tables. The result of the first translation phase is an address which is called “Linear address”.
Physical address: The real address which is understood by main memory. This address is a valid address to index the RAM.

Segmentation
Memory segmentation is not used anymore in modern x86-64 processors. But unfortunately the phrase “not used” cannot exactly reflect the real story so it needs more explanation. First let’s see how segmentation worked in older Intel architectures. Segmentation means, specifying different areas of memory to be used for different specific purposes. For example an area can be used to store program codes. Another one can be used to contain program dynamic memory (Heap) . etc. These were traditionally called partitions and later, segments. In this case when a program issues an address, the processor first needs to know to which segment this address belongs. The segment is determined according to the assembly code (machine instruction) that is going to be executed. For example if it’s a “call” instruction the processor needs to refer to the code segment, if it’s “mov [eax],0” the processors refers to data segment. To find the segments in memory the processor needs to know the base address of the segments as they are scattered over memory and don’t have a pre-defined location. The base addresses of the segment are stored in their respective segment descriptors. Okay? So the processor first needs to know where the corresponding descriptor is resident in memory. For this job, a segment selector is used to point to the descriptor in memory. The selector is used as an index to the Global Descriptor Table (GDT) and the address of GDT is stored in a special CPU register called GDTR (Global Descriptor Table Register).
A segment selector has the following format:

 
|-------- 13 bits ---------| 1 bit | 2 bits |
            ^                  ^      ^
            '                  '      '
            '                  '       `-------RPL: Requester Privilege Level
            '                  `----- TI: Table Indicator
            `------------ Index to GDT

Table indicator specifies whether the segment descriptor is in GDT or LDT (Local descriptor table). Usually one GDT is used but every processor may have its own LDT.
RPL is the privilege level of the current running code (Current Privilege Code AKA CPL) or the privilege level the code is requesting through introducing an explicit selector. Implicit selectors are stored in special CPU registers (CS,DS,ES,SS,FS,GS) so helping the CPU to find the respective descriptor faster by avoiding an additional memory access to load the segment selector from the RAM.
RPL (or CPL) is very important and as we see later in the post the only reason the process still use “Code Segment register” in 64 bit mode. This value shows the current privilege level of the running code. There are some inner workings in CPU for comparing CPL and RPL of the desired selector (which may be implicit or explicit) to moderate access to segments (data segments specially) but they are not related to our article. Just remember what the aforementioned 2 bits mean in the CS register. The two bits can have values 0 through 3. In Linux only 0 (kernel mode execution) and 3 (user mode execution) are supported.

Now let’s put all this together. The first stage of translating a virtual address to physical address is done like this:

– The processor reads the address from the instruction that is going to execute.
– The processor specifies to which segment the address belongs, this is done either by using default segments for the registers hardcoded in the instruction (for example when dereferencing eax value) or by segment prefix override register, for example [CS:EAX]
– The processor reads the TI bit to specify to which table it needs to refer to. For most cases it’s 0 denoting GDT. For this explanation we also suppose the descriptor is in GDT (everything is similar for LDT)
– The processor loads the Linear address (take care that it’s linear not physical) of GDT from GDTR register.
– Each segment descriptor (as shown later) is 8 bytes long. SO if the base address is A for example, the first segment descriptor is at address A-A+7 the second one at A+8-A+15 and the nth entry at A+8*n-A+(n+1)*8-1. So having the index of the descriptor (by reading from the segment selector) the respective entry would be at address A+8*index in linear address space. The processor loads the entry by accessing the entry in main memory (if not already loaded in the hidden part of the segment selector – theĀ  hidden part is used to hold the descriptor to aviod referring to memory to find it) after resolving the corresponding physical address (through page tables). Each segment descriptor has the following format: (scroll horizontally to see the full figure)

 
|--- 16 bits --- | --- 16 bits --- | --- 8 bits --- | --- 8 bits --- | --- 4bits --- | --- 8 bits --- | --- 8 bits --- |
        '               '                  '                '                '                '                '
        '               '                  '                '                '                '                '
        '               '                  '                '                '                '                '
        '               '                  '                '                '                '                '
        '               '                  '                '                '                '                '
        '               '                  '                '                '                '                '
        '               '                  '                '                '                '                '
        '               '                  '                '                '                '                `-----Base[24:31]
        '               '                  '                '                '                `-----Flags[8:15]
        '               '                  '                '                `-----Limit[16:19]
        '               '                  '                `-----Flags[0:7]
        '               '                  `---Base[16:23]
        '               `---Base[0:15]
        ` Limit[0:15]
Limit: This is the segment size 
Base: Segment base address 
Flags: The most important one is "L" bit which denotes 32 o 64 bit execution.

– The processor reads the base address and adds the requested address (in the instruction) as the offset to this base address. Before adding, it checks whether the offset is beyond the limit or not. If it is, the translation stops by firing a general protection exception. Otherwise adding the base address and offset produces the linear address which is the input to the next translation stage.

 
 ---------------------------            ------------------------------------------------                                       
| Segment selecltor[16bits] |          |      Address (offset into the segment)         |
 ---------------------------            ------------------------------------------------
     '                                                      '
     '                                                      '
     '           Desc. Table                                '
     '         ---------------                              '
     '        |               |                             '
     '        |               |                             '
     '        |               |  Seg. linear base address   '
     `---->   |Seg. Descriptor|-----------------------------+
              |               |                             '
              |               |                             '
              |               |                             '
               ---------------                              '
                                                            '
                                                     Linear address

Now this is the important part. In 64 bit mode, base addresses of segments (except sometimes for GF and FS which are not related to our explanation) are all zero and the limit is not used. Meaning all segments overlap each other and occupy all the linear address space. In this case the logical and corresponding linear addresses in 64 bit mode (IA-32e) are equal
Logical address + 0 = Linear Address => Logical address = Linear address
So what’s the point of having these segment descriptors in 64 bit mode? It’s only the CPL of CS register that makes it still being usable. By reading this value the processor assumes a privilege level for the running code (If you ever asked yourself how kernel mode is separated from user mode in hardware level, this is these two bits that does the job). Data segment in not used for address translation at all in IA-32e mode.

Paging
As said before the second stage of translation includes turning the linear address into the RAM physical address. This is done by the help of page tables. IA-32e mode uses 4 level paging and any operating system kernel must respect this design so needs to have some data structures to manage the 4 level paging. Pages are created and modified by the OS and the translation is done at hardware level. Each program page if loaded is mapped to a physical frame with the same size of a page. Let’s see these 4 levels from the processor perspective and their corresponding Kernel data structure.

 
                                         These two parts are not ommited for 32 bit addressing
                                                                  ^
                                                     -----------------------------
                                                     '                           '
                                                     '                           '
                                                     '                           '
                                                     '                           '
                        9 bits                      9 bits                   9 bits                     9 bits          12 bits 
Linux Naming:  |  Page Global Directory  |   Page Upper Directory   |   Page Middle Directory    |    Page Table    |    OFFSET   |
Intel Naming:  |         PML4            |      Directory Ptr       |          Directory         |       Table      |    OFFSET   |
                                                                                                          '                '
                                                                                                          '
                                                                                                          '
                                                                                                          '
                                                                                                          ---------------|---------  =>  52bit 
                                                                                                        40 bits
Kernel page levels variables:
    pgd_t: Page global directory table
    pud_t: Page upper directory table 
    pmd_t: Page middle directory table 
    pte_t: Page table  

The variables are defined in /arch/x86/include/asm directory header files. For example consider pgd_t. This variable is defined in the header file pgtable_types.h line 258 as:
typedef struct { pgdval_t pgd; } pgd_t;
In turn pgdval_t is defined in pgtable_64_types.h line 16:
typedef unsigned long pgdval_t;
So these are 64 bit wide and you can probably understand from the schema above that each page is 4 KB wide (the 12 bit offset is for indexing into each page). I should note that only 48 bits of a 64 bit linear address are used for translation. The rest of the bits are sign extended and not used in current 64 bit architecture. But it may be interesting even strange to know that the physical address can be at maximum 52 bits wide. (Of course today’s computers has much less RAM than what needs 52 bits to be addressed, so a considerable part of this 52 bit address is unused)

Let’s conclude our translation’s second stage in a schematic form:


A=Linear address[64bit] ------------------------------------------------------------------------------------------------------------------------
                    '                     '                             '                        '                                              '
                    ' A[39:47]            ' A[30:38]                    ' A[21:29]               ' A[12:20]                                     ' A[0:11] 
                    '                     '                             '                        '                                              '
                    '                     '                             '                        '                                              '
                    '                     '                             '                        '                                              '
                    '                     '                             '                        '                                              '
                    '                     '                             '                        '       40 bits physical frame address         '
                 |PML4|-----------> |Directory Pointer|------------> |Directory|------------->|Table|-------------------------------------------+
                                                                                                                                                '
                                                                                                                                                '
                                                                                                                                                '
                                                                                                                                                / 
                                                                      -------------------------------------------------------------------------
                                                                      '
                                                                      '
                                                                      '
                                                       Physical Address [52 bits max]

 

Okay that’s it, I try to make my posts short but rich. Feel free to ask any question if you need more details. Thanks for reading!

Leave a Reply

Your email address will not be published. Required fields are marked *