Practical Timing Side Channel Attacks Against Kernel Space ASLR

Practical Timing Side Channel Attacks Against Kernel Space ASLR Fall 2014 Instructor: Kun Sun, Ph.D.

Outline • Background • Side Channel Attack • Kernel Space ASLR • Use timing-based side channel to defeat kernel space ASLR (A heavy engineering work) • How to prevent/mitigate this side channel attack?

Outline • Background • Side Channel Attack • Kernel Space ASLR • Use timing-based side channel to defeat kernel space ASLR • How to prevent/mitigate this side channel attack?

Side Channel • Side channel vs. Covert channel • Side channel: incidentally, unintentionally • Covert channel:colluding, intentionally • A side channel is a communication channel based on the use of system resources not normally intended for communication between the subjects (processes) in the system

Side Channel Types • Computer systems abound with side channels • Storage Channels • use system variables and attributes (other than time) to signal information • classic example is resource exhaustion channel • Timing Channels • vary the amount of time required to complete a task to signal information • classic example is load sensing channel • Side channels are typically noisy but information theory techniques can be used to achieve error-free communication

Resource Exhaustion Channel • Given 5MB pool of dynamically allocated Memory • High Process • bit = 1  request 5MB of memory • bit = 0  request 0MB of memory • Low Process • request 5MB of memory • if allocated then bit = 0 otherwise bit = 1

Load Sensing Channel • High Process • bit = 1  enter computation intensive loop • bit = 0  go to sleep • Low Process • perform a task with known computational requirements • if completed quickly then bit = 0 otherwise bit =1

Mitigating Side Channel • Audit for storage-based • Time normalization for timing-based • Slow it down, reduce the transmission date rate

ASLR Deployment • Windows user space and kernel space ASLR since vista. • Pax patches in Linux (not in kernel) • MacOS since 10.5 • Mobile OSes including Android and iOS

User Space ASLR vs. Kernel Space ASLR • Shacham et al. showed that user mode ASLR on 32-bit architectures only leaves 16 bit of randomness, which is not enough to defeat brute-force attacks. • However, such brute-force attacks are not applicable for kernel space ASLR. • An attacker wants to exploit a vulnerability in kernel code, a wrong offset will cause a complete crash of the system and thus an attacker has only one attempt to perform an exploit.

Windows Kernel Space ASLR • For both 32-bit and 64 bit Vista, Windows 7, and Windows 8. • Obtain the info by reverse-engineering • During boot process, Windows loader allocates kernel_region for the kernel image and the hardware abstraction layer (HAL) • The entropy is 6 bit, 32*2=64=26

Timing-based Side Channel Attack • Target at how a local attacker with restricted rights can mount a timing-based side channel attack against the memory management system to deduce information about the privileged kernel space ASLR.

Attack Model • System is protected by ASLR and • Local adversary can run user-space applications, but cannot execute privileged kernel mode code.

Challenges on Exploiting the Attack • Timing attacks against a modern CPU are complicated. • Performance optimizations including hardware prefetching, speculative execution, multi-core architectures, or branch prediction. • Target on modern CPU and new OS • Previous work focused on older processors without such optimization.

TLB and Cache • Translation lookaside buffer: convert virtual address to physical address • Virtually indexed, physically tagged http://www.cs.princeton.edu/courses/archive/fall04/cos471/lectures/19-VirtualMemory.pdf

Intel i7 Memory Architecture plus Clock Latency

General Approach • Two steps: • Set the system in a specific state from user mode. • Measure the duration of a certain memory access operation. The time span of this operation then (possibly) reveals certain information about the kernel space layout. • One example: • Cache a system call handler by executing sys call. • Access a designated set of user space addresses and execute the system call again. • If the system call takes longer than expected, the access of user space addresses evicted the system call handler code and reveals address info of system code.

Handling Noise • Circumvent performance optimizations • E.g., test memory in non-consecutive order to circumvent memory pre-fetching. • OS Scheduler, shared cache • Two steps requires 5000 clock cycles • OS time slides several ms • Guarantee to find cases with No Interrupts between two steps.

Three Attacks • Attacks on L1/L2/L3 caches and TLB/PS caches • Cache probing • Double page fault • Cache preloading

Cache Probing • Based on multiple memory addresses mapped into the same cache set and compete for available slots • n-way set associative mode. • all available slots are grouped into sets of the size n and each memory chunk can be stored in all slots of one particular set. • E.g., in 32-bit address, a typical L3 cache of 8 MB is 16-way set associative. It consists of (8, 192∗1, 024)/64 = 131, 072 single slots that are grouped into 131, 072/16 = 8, 192 different sets. • Reverse-engineering Sandybridge Hash Function for L3 parallel access among multiple cores

Double Page Fault • Based on the observation on Intel CPUs • When the page translation was successful, but the access permission check fails (e.g., when kernel space is accessed from user mode), a TLB entry is indeed created. • Do not work on AMD CPUs • Can be easily fixed by hardware redesign.

Double Page Fault • First page fault: for each kernel space page p, access p from user mode and a page fault will be handled by the operating system and forwarded to the exception handler of the process. • When p refers to an allocated page, the MMU creates a TLB entry for p although the succeeding permission check fails. • When p refers to an unallocated page, since the translation fails, the MMU does not create a TLB entry for p. • Second page fault: accessing p again, if p refers to an allocated kernel page, the page fault will be delivered faster due to the inherent TLB hit.

Cache Preloading • Double page fault estimates the allocation map of the kernel space, but do not determine at which concrete base addresses the kernel and drivers are loaded to. • 3 steps in Cache Preloading: • flush all caches (i.e., address translation and instruction/data caches) to start with a clean state. • preload the address translation caches by indirectly calling into kernel code, for example by issuing a sysenter operation. • Intentionally generate a page fault by jumping to some kernel space address and measure the time. If the faulting address lies in the same memory range as the preloaded kernel memory  a shorter time

Testbed • Tested on 32-/64-bit systems running Windows 7/Linux • Tested on different CPUs + VM: • Intel i7-870 (Nehalem/Bloomfield, Quad-Core) • Intel i7-950 (Nehalem/Lynnfield, Quad-Core) • Intel i7-2600 (Sandybridge, Quad-Core) • AMD Athlon II X3 455 (Triple-Core) • VMWare Player 4.0.2 on Intel i7-870 (with VT-x)

Results

Outline • Background • Side Channel Attack • Kernel Space ASLR • Use timing-based side channel to defeat kernel space ASLR (A heavy engineering work) • How to prevent/mitigate this side channel attack?

Mitigation Approaches • Root cause: memory hierarchy is a shared resource • Also, no enough randomization: 64 different slots for windows kernel • Option one: split caches for user and kernel. • Shortcoming: checking overhead, maximum cache size reduced • Option two: suppress the creation of TLB entries on successful address translation if an access violation happens • Option three: Timing normalization; modify the execution time of the page fault handler

Future works • Test MacOS • Test Android • Obtain the physical address of a certain memory location from user mode.

Questions?

Practical Timing Side Channel Attacks Against Kernel Space ASLR