480 likes | 618 Views
Hypervisor-Assisted Application Checkpointing for High Availability. Min Lee Joint work with A. S. Krishnakumar , P. Krishnan, Navjot Singh, Shalini Yajnik. Introduction. V irtualization technology Gets adopted widely Proves its usefulness Most applications run well Natively run
E N D
Hypervisor-Assisted Application Checkpointing for High Availability Min Lee Joint work with A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik
Introduction • Virtualization technology • Gets adopted widely • Proves its usefulness • Most applications run well • Natively run • Some important applications don’t run well • Certain operations cannot run natively • Instead they use hypercalls • Our target: Application-checkpointing
Xen Virtual Machine Monitor Applications Applications Applications … … Modified Guest OS Modified Guest OS Modified Guest OS Virtual machines Virtual hardware (vCpu, vDisk, vNic, vMemory etc.) Xen Hypervisor Physical hardware (Cpu, Disk, Nic, Memory etc.) (Taken/adapted from ‘Xen and co.’ slides)
High Availability Approaches • Categories • Application-transparent • No changes to application or guest • Xen-specific: Remus, Kemari • Application-assisted • Application implements the checkpointing logic • Flexible and light-weight • We are targeting • Application-assisted under virtualization • Xen-specific • Applicable to general hypervisors
Hypervisor-Assisted Application Checkpointing • Application checkpointing • Provides transactional properties to the traditional heap • Make high available heap • Processes survive failures • Has performance issues in Xen • Our technique improves application-checkpointing performance in Xen
High Availability Magical mirror List_add() changes List_del() changes Crash List_add() Takeover List_add()
Transaction APIs APIs: int declare(addr, size); void undeclare(Tid); void Tstart(Tid); void Tend(Tid, dirty_pages); • List of dirty-pages • Written pages • Mprotect() system call • Write-protect • SIGSEGV signal Tstart(); List_add(); Tend(); List_add(); Examples: Tstart(); List_add(); List_del(); List_add(); List_del(); Tend(); List_add(); List_del(); List_add(); List_del();
PT – Existing Approach Process’ view (virtual pages) • Get dirty pages Declare() {} 1 1 2 2 Tstart(); 3 3 handler() { mprotect(unprotect); add_to_dirty_pages(); } 4 4 List_add(); 5 5 5 6 6 List_add(); 7 7 7 8 8 9 9 Tend(); … 10 10 11 11 Undeclare() {} 12 12 5 7
PT Call-Flow • Pure User-level For every dirty page User Mprotect() Mprotect() Signal OS Page fault Hypervisor TLB flush TLB flush
Approaches Our approaches
Emulation Process’ view (virtual pages) • Under the condition • Most transactions are small Declare(); 1 1 2 2 Tstart() {} 3 3 handler() { emulate(); log_to_write_buffer(); } 4 4 List_add(); 5 5 6 6 List_add(); 7 7 8 8 9 9 Tend(); … 10 10 11 11 12 12 Undeclare(); (Addr1,100) (Addr2,200)
Hypervisor-Assisted:User-to-hypervisor call • Overhead through OS unnecessary • Directly talk to Xen • Move checkpointing to Xen level • Add new interrupt vector • 0x80: system call • 0x82: hypercall from guest OS • 0x84: hypercall from user (Newly added) • Xen-based approaches without any changes to guest OS.
Hypervisor-Assisted:User-to-hypervisor call • User-to-Hypervisor Call
PTxen Process’ view (virtual pages) • Implement PT in Xen Declare(); 1 1 2 2 Tstart() {} 3 3 4 4 List_add(); 5 5 5 6 6 List_add(); 7 7 7 8 8 Tend(); … 9 9 10 10 Undeclare() {} 11 11 12 12 ----- Xen ----- Process1, (1-12) 5 page_fault() { mprotect(unprotect); add_to_dirty_pages(); } 7
Emulxen Process’ view (virtual pages) • Emulation in Xen Declare(); 1 1 2 2 Tstart() {} 3 3 4 4 List_add(); 5 5 6 6 List_add(); 7 7 Tend(); … 8 8 9 9 10 10 Undeclare(); 11 11 12 12 ----- Xen ----- Process1, (1-12) (Addr1,100) page_fault() { emulate(); log_to_write_buffer(); } (Addr2,200)
Scanxen Process’ view (virtual pages) • Idea • Scan page table rather than trapping writes • Hardware marks dirty bit = Dirty-bit in page table Declare(); 1 2 Tstart() {} 3 4 List_add(); 5 6 List_add(); 7 8 Tend(); … 9 10 Undeclare(); 11 12 ----- Xen ----- Process1, (1-12) 5 scan_page_table() { collect_dirty_bit(); add_to_dirty_pages(); } 7
Microbenchmark • Transactional heap size • For simplicity, whole heap is protected • Transaction • Write per pages (wpp) • # of writes per pages • Page per transaction (ppt) • # of unique pages written • # of writes = wpp * ppt • Scanxen • Impacted by only heap size • Not wpp, ppt, or transaction size
PT vsPTxen • PTxen shows 10x speedup • PT, PTxen get impacted by ppt PT (wpp = 4, 8, 16 overlapped) PTxen (wpp = 4, 8, 16 overlapped)
Emulation vsemulxen • Emul-based gets impacted by transaction size • Emulxen shows 4x speedup emul emulxen ppt (wpp=16) : 1 2 3 4 5 6 7 8 ppt (wpp=8) : 2 4 6 8 10 12 14 16 ppt (wpp=4) : 4 8 12 16 20 24 28 32
PT Call-Flow • Pure User-level • Xen-assisted For every dirty page User User declare() Mprotect() Mprotect() Signal OS OS Page fault Page fault Hypervisor Hypervisor TLB flush TLB flush TLB flush
EvaluationSource from the book “Data Structures and Algorithm Analysis in C (Second Edition),” by Mark Allen Weiss
Evaluation Results 1 PT PTXen PT PTXen PT PTXen PT PTXen
Evaluation Results 2 PTXen PT PT PTXen PT PTXen
Evaluation Results 3 • Scanxen shows almost constant 2.5sec across all PT PTXen PT PTXen
Evaluation Summary • Emulxen has up to 4x speedup compared to emulation • PTxen has up to 13x speedup compared to PT
Transaction Aggregation • OPT=1 • A single operation (e.g. an insert or a delete) • OPT=5 • Multiple operations merged into one transaction • # of writes increases linearly • # of unique pages touched remains same in most cases • It should benefit PT-based approaches • Because of their heavy dependence on PPT • Details in the paper
Conclusion • Family of application checkpointing techniques introduced • Emulation-based techniques • Useful for small transactions [fewer # of writes] • Hypervisor-Assisted Application Checkpointing • 4x~13x than userspace implementation
Emulation vs PT Note scale difference • Emul-based is good for small transaction • Roughly wpp=5 and wpp=1.3 is breakeven point
Scanxenvs PT Note scale difference Scanxenheapsize Scanxenheapsize 5MB • For small buffer and large ppt, scanxen might be better • Not the case in our experiments 120KB 80KB 4MB 40KB 3MB PT 2MB 1MB PTxen
Scanxenvs emulation • Scanxen might be better than emulation • For big transactions emul Scanxen emulxen
Operations per transaction • OPT=5 , Merging transaction • No impact to emulation-based ones • Some slowdown for scanxen • Merging transactions • Total # of pages written goes down effectively • PT and PTxen becomes much better than emul/emulxen • Still 13x improvement between PT and PTxen
Bandwidth : Amount • Emul-based mostly less than 2MB • No ‘diff’ process for emul-based Note that tree-insert is 56311.34375 which is out of scale.
Bandwidth : Time • Emul-based mostly less than 5ms
Bandwidth : Percentage • Relatively small fraction • Except PTxen --- due to its minimum runtime
Microbenchmark PT scanxen PTxen
emul scanxen emulxen
PT emul scanxen emulxen PTxen
Microbenchmark Tstart() of PT writes Tend() of PT Three separate mprotect() calls Tstart() of PTxen writes Tend() of PTxen Single PTxen() call Transactional heap Dirty pages in Transactional heap
Main process Diff process Network Backup process dirty page diff