I had explained in previous posts how paging works at kernel level and if you have read them you may remember an important feature in the way all operating systems manage page tables and it’s the fact that the kernel space is always accessible irrespective of which global page table is currently resident in CR3 register or in other words regardless of what address space is currently being used at any time. This decreases the immense overhead of having to switch to kernel memory any time the CPU is going to run kernel code or wants to access any parts of kernel data structures.
Unfortunately this important feature have disappeared in an emergency update released this week. The kernel developers split the the kernel memory completely from all user processes out of fear of having kernel addresses exposed to user space programs. It’s been reported that based on a hardware-level vulnerability it’s possible for userspace programs to infer (and maybe tamper with) the proprietary memory area of the kernel. Any working PoC of the related attacks have (unsurprisingly) not been disclosed. However they say that resolving this issue needs a hardware level change.
There’s even no documentation about the new change and I’m too busy right now to do reverse the code but taking a quick look at the source code shows that the “Page Table Isolation” is done if the early kernel code detects the CPU is considered “insecure”. There’s a new definition in file “/arch/x86/include/asm/cpufeatures.h” referring to this, named “X86_BUG_CPU_INSECURE”:
. . . #define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */ #endif #define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */ #define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ #define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ #define X86_BUG_CPU_INSECURE X86_BUG(14) /* CPU is insecure and needs kernel page table isolation */ . .
You can see the definition above. Apparently Intel-produced x86 processors suffer the vulnerability the most but I’m not quite sure how the situation goes for AMD-produced X86 processors. There are two papers published about the issue; “Meltdown” and “Spectre”. The papers suggest that by exploiting these two vulnerabilities it’s possible to read the whole contents of the system RAM from user space.
With all that said, I have read and noticed this patch ostensibly is very expensive as a matter of efficiency; the reason is actually the one based on which we refrained from isolating kernel memory previously. The speed penalty happens because kernel needs to switch memory every time the CPU moves between kernel-user space (hundreds of times each second). Microsoft has been also reported to be doing the same for its windows operating systems.
We need to wait until a resolution is suggested and implemented whether at hardware or software level to address this important issue. For now what we see out of this patch is just a slower and a more complex kernel.