Software fault isolation with API integrity and multi-principal modules

Software fault isolation with API integrity and multi-principal modules Yandong Mao, Haogang Chen (MIT CSAIL), Dong Zhou (TsinghuaUniversity IIIS), Xi Wang, NickolaiZeldovich, FransKaashoek (MIT CSAIL)

Kernel security is important • Kernel is fully privileged • Kernel compromises are devastating • Remote attacker takes control over the whole machine • Local user gains root privilege

Linux kernel is vulnerable • Vulnerabilities in Linux are routinely discovered • CVE 2010: 145 vulnerabilities in Linux kernel • Many exploits attack kernel modules • 67% of Linux kernel vulnerabilities (CVE 2010) • This talk focuses on vulnerabilities in kernel modules

Threat • Module programmer makes mistake • Attacker exploits mistake to mount attacks • Example: buffer overflow, set current UID to root Module Privilege escalation! Kernel memory Module memory UID

One approach: type safe languages • Write kernel and modules in Java, C# • No reference to UID object => cannot directly change UID • Attacker cannot synthesize references Module Most kernels are not written in type safe language! UID

Software Fault Isolation (SFI[SOSP93]) Module Can not bypass SFI check char *p = 0xf7; sfi_check_memory(p); *p = 0; SFI Runtime void sfi_check_memory(p) { if p not in “Module memory” stop_module(); } Module memory UID

Memory safety is insufficient for stopping attacks! • Challenge: module needs to call kernel functions Core Kernel Spin_module • void spin_lock_init(spinlock_t*lock) { lock->v = 0; } spinlock_tmylock; spin_lock_init(&mylock); Module memory UID

Problem: API abuse • Attacker tricks fully-privileged kernel code to overwrite UID Core Kernel Spin_module • void spin_lock_init(spinlock_t*lock) { lock->v = 0; } spin_lock_init(&cur_proc->uid); Privilege escalation! Module memory UID

Challenge: lack of API integrity • Kernel APIs are not written defensively • Assume the calling module to obey implicit rules • Do not check arguments, permissions, etc • Problem: modules cannot be trusted to follow rules • Module can trick kernel into performing unexpected actions • Ideal system would enforce rules for kernel API • Analogy: system call code assumes nothing about caller, checks every assumption

State of the art for protecting APIs • SFI[SOSP93]: memory safety • XFI[OSDI06]: no argument checks • BGI[SOSP09]: manually wrap functions, make kernel defensive when kernel code invokes callbacks • Error-prone and time-consuming • Works if kernel code is well-structured (not Linux)

Our approach: annotation language • Helps enforce two types of API integrity: • Argument integrity: programmer controls what arguments a module can pass to functions • Callback integrity: kernel invokes callback only if the module could have invoked callback directly • Allows programmers to specify principals for privilege separation within a module • Less error-prone than manual wrapping, applicable to complex APIs such as those in Linux

Contributions • LXFI: software fault isolation system for Linux kernel modules • Annotation language for • Argument integrity • Callback integrity • Privilege separation within a module • Evaluation • Few annotations for 10 Linux kernel modules • Stop three real exploits • 2-4X CPU overhead for netperf

Goals for annotation language • Enforce argument integrity, callback integrity and privilege separation within a module • Minimize programmer effort, e.g.: • Few annotations • Avoid data structure and API changes • Compatible with C

Preventing module exploits Programmer annotates core kernel If annotations capture all implicit rules, compromised module cannot violate rules to gain additional privileges. Using compiler plugins; Provide safe default: reject a module if it calls an unannotated API LXFI translatesannotationsto runtime checks Compile time Consulting a dynamic table of capabilities for each module LXFI performs checks Runtime

Design of annotation language • Argument integrity annotations • Using the spin_lock_initexample • Callback integrity annotations • Not discussed; see paper • Privilege separation annotations • Using dm_crypt(real Linux kernel module)

Enforce argument integrity • spin_lock_init: three annotations are required

Example: enforce argument integrity for spin_lock_init Core Kernel Spin_module void spin_lock_init(spinlock_t*lock) pre(check(write(lock, sizeof(spinlock_t))) capability table write(mylock, 8) LXFI Runtime • lxfi_check_write(mylock, 8); spin_lock_init(mylock) …… • lxfi_check_write(&cur_proc->uid, 8); spin_lock_init(&cur_proc->uid) …… Privilege escalation prevented Module memory UID

Where does the capability come from? • Granted on allocation • Two more annotations are required

Example: grant spinlock Core Kernel Spin_module void *kmalloc(size) post(copy(write(return, size)) spinlock_t *mylock = kmalloc(8); lxfi_copy_write(mylock, 8); …… LXFI Runtime capability table write(mylock, 8)

What happens when memory is freed? • Need to revoke capability to safely reuse memory • Strawman: revoke capability from caller • Insufficient! Other modules may have copies of capability No other copies of the capability remain

Example: safely free a spinlock Core Kernel Spin_module void kfree(void *p) pre(transfer(write(p, no_size))) LXFI Runtime • lxfi_transfer_write(mylock, -1); kfree(mylock); …… capability table write(mylock, 8) other_module capability table write(mylock, 8)

Why is spin_module able to call spin_lock_init, kmalloc, kfree? • Call capability • Granted initially according to the module’s symbol table • Trust module author not to call unnecessary functions • Dynamically granted when a callback function is passed

Core Kernel Spin_module void *kmalloc(size) post(copy(write(return, size)) void spin_lock_init(spinlock_t *lock) pre(check(write(lock, sizeof(spinlock_t))) void kfree(void *p) pre(transfer(write(p, no_size)) capability table call(kmalloc) call(spin_lock_init) call(kfree) LXFI Runtime …… spinlock_t *mylock = kmalloc(8); lxfi_copy_write(mylock, 8); lxfi_check_write(mylock, 8); spin_lock_init(mylock)l lxfi_check_write(&cur_proc->uid, 8); spin_lock_init(&cur_proc->uid); lxfi_transfer_write(mylock, -1); kfree(mylock); …… …… ……

No way for compromised spin_module to gain root privilege • SFI ensures memory safety • Call capabilities ensure only 3 functions are allowed • None of the functions can modify UID because: • kmalloc never modifies allocated memory • spin_lock_init can only be called with writable memory (from kmalloc) • kfree ensures no capabilities remain after free • spin_module can not modify UID!

Privilege separation within a module • dm_crypt: transparent encryption service for block devices • This example requires a third type of capability Pass argument a as type t

Privilege separation write(“/etc/secret.txt”, “foo”) User space Kernel space Core Kernel intbdev_write(block_device *dev, const char * data, …) pre(check(ref(block_device), dev) write(enc_disk, “foo”, …) dm_crypt capability table ref(block_device, enc_disk->bdev) Writing block device does not require writing to memory of enc_disk->bdev. LXFI Runtime • lxfi_check_ref(block_device, enc_disk->bdev) bdev_write(enc_disk->bdev, E(“foo”), …)

Privilege separation read(…) User space Kernel space Core Kernel intbdev_write(block_device *dev, const char * data, …) pre(check(ref(block_device), dev) dm_crypt capability table ref(block_device, enc_disk->bdev) ref(block_device, enc_usb->bdev) capability table ref(block_device, enc_disk->bdev) capability table ref(block_device, enc_usb->bdev) LXFI Runtime Decrypt • lxfi_check_ref(block_device, enc_disk->bdev) bdev_write(enc_disk->bdev, “/etc/pwd”, “foo”) /etc/pwd: rootpwd=foo

How to define principals • Associate a principal with every instance a module supports (e.g. block device in dm_crypt) • Problem: how to specify and name principals? • Recall goal: minimize changes to existing data structures • Idea: re-use address of data structure as the name of the principal • Can typically identify principal from one of the function arguments

Specifying principals

Privilege separation User space Kernel space Core Kernel structdm_type { int (*map)(structdm_target *di); principal(di)}; lxfi_set_princ(enc_usb) dm_crypt.map(enc_usb) dm_crypt capability table write(enc_disk->bdev, 100) capability table write(enc_usb->bdev, 100) LXFI Runtime Decrypt • lxfi_check_write(enc_disk->bdev, 100) bdev_write(enc_disk->bdev, “/etc/pwd”, “foo”) /etc/pwd: rootpwd=foo

Principal name aliasing • Problem: Kernel identifies a LXFI principal by multiple addresses • Insert code into module to create alias • The same principal now has multiple names inte1000_probe(structpci_dev *pcidev) { structnet_device *ndev= alloc_etherdev(...); ndev->pcidev = pcidev; ... } int e1000_xmit(structnet_device *dev) { …} lxfi_princ_alias(pcidev, ndev);

Other annotation language features Save annotation effort for complex objects that need multiple capabilities Express conditional action such as grant a privilege if return value is OK Global:principal with full privilige Shared:principal with minimal privilege

Implementation • Linux 2.6.36, x64, single-core • gccplugin: kernel rewriting for callback integrity • Clang/LLVMplugin: module rewriting • Annotation propagation saves effort by inferring annotations of module functions

Example: annotation propagation //linux/drivers/net/e1000/e1000_main.c int e1000_probe(structpci_dev *pcidev) { ….} structpci_driver e1000_driver = { .probe = e1000_probe }; //from linux/include/pci_driver.h structpci_driver { int (*probe)(structpci_dev *pcidev) principal(pcidev) pre(copy(ref(structpci_dev), pcidev) } //linux/drivers/net/ixgbe/ixgbe_main.c intixgbe_probe(structpci_dev *pcidev) { ….} structpci_driverixgbe_driver = { .probe = ixgbe_probe }; LXFI propagates annotation on probe to modules

Evaluation • Security • Annotation effort • Performance overhead

Security • Test LXFI with three real privilege escalation exploits • Stopping real attacks requires API integrity

Annotation effort • Annotate kernel APIs for 10 modules, one at a time • Count: • # of annotated core kernel functions a module calls • # of function pointer declarations a module exports to core kernel

Sharing reduces annotation effort

LXFI performance • netperf, 1 Gigabit e1000 network card, LAN • Stresses LXFI ~30% decrease

CPU time of LXFI actions for netperf • Room for improvement 80%

Future work • Improve performance • Faster capability management such as BGI’s • Extend annotation language to enforce other types of API integrity • Perhaps based on Singularity’s contracts

Related work • Type-safe kernels: Singularity [MSR-TR05] • LXFI provides similar guarantees in C • Good support for revocation (transfer) and principals • Software fault isolation • LXFI extends existing SFI systems (SFI, XFI, BGI) with annotation language

Conclusion • Extend SFI with annotation language for: • Argument integrity • Callback integrity • Principals • LXFI: Prototype for Linux • Annotated 10 kernel modules • Prevented 3 real privilege escalation exploits • 2-4X CPU overhead when stressing with netperf

Q & A

Software fault isolation with API integrity and multi-principal modules