140 likes | 227 Views
Memory-efficient Virtual Machine High Availability. Karen Kai-Yuan Hou Prof. Kang G. Shin University of Michigan Mustafa Uysal (VMware) Arif Merchant (HP Labs) Sharad Singhal (HP Labs). Protect VM from Host Failures. Set up backup by primary VM replication
E N D
Memory-efficient Virtual Machine High Availability Karen Kai-Yuan Hou Prof. Kang G. Shin University of Michigan Mustafa Uysal (VMware) Arif Merchant (HP Labs) SharadSinghal (HP Labs)
Protect VM from Host Failures • Set up backup by primary VM replication • Backup takes over execution promptly if primary fails • High memory costE.g. To protect a 1G VM, an additional 1G memory is reserved to just hold the backup. App 1 App 2 App 1 App 2 Physical Host Failure Primary VM Backup VM Hypervisor Hypervisor Primary Host Backup Host
Use a Shared Storage • “Maintain” backup VM in storage instead of RAM • Improve resource and energy efficiency. Recover anywhere. Other primary (active) VM App 2 App 1 App 1 App 2 App 2 App 1 Other primary (active) VM Primary VM Primary VM Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Host 1 Host n Host 1 Host 2 Host 2 Shared Storage Backup VM
Protection: Tracking Primary VM State • Take checkpoints of the primary VM • Incremental, periodic, copy-on-write checkpoints App 1 App 2 Primary VM VM memory space VM Fail-over Image
Fail-over: Bringing Up Backup VM • Slim VM Restore • Load only necessary informationand switch on backup VM quickly • Fetch pages on-demand as the backup VM executes App 1 App 2 Restored backup VM VM memory space VM Fail-over Image
Improving I/O Efficiency with SSDs • Small, random I/O’s are more efficient on SSDs Primary Side Updating the VM image continuously. Restore Side Fetching from the VM image on-demand. small, random writes small, random reads VM Fail-over Image
Preliminary Evaluation • Prototype built on Xen 3.3.2 • Questions • How much overhead does continuous checkpointing introduce on the primary VM? • How does the shared storage support continuous updating of the fail-over image? • How quickly can our system bring up a backup VM? • How does the backup VM perform when it executes by fetching pages on-demand?
Checkpointing Overheads • Kernel Compilation • RUBiS
CoW and SSD Enhancements • CoW reduces VM pause time for taking checkpoints • Checkpoints commit faster on a SSD
Fail-over Time and Demand Fetching • Time required to bring up a backup VM • Overheads of fetching VM pages on-demand
Interesting Observations:Page Fetching Behavior • How a VM uses (demand fetches) its pages while compiling a kernel:
Interesting Observations:Page Fetching Behavior • What actually happens on disk (recorded by blktrace):
Conclusions 35 s