230 likes | 417 Views
A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines. Kenichi Kourai Shigeru Chiba Tokyo Institute of Technology. Server consolidation with VMs. Server consolidation is widely carried out Multiple server machines are integrated on one physical machine
E N D
A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines Kenichi KouraiShigeru Chiba Tokyo Institute of Technology
Server consolidation with VMs • Server consolidation is widely carried out • Multiple server machines are integrated on one physical machine • Recently, using virtual machines (VM) • VMs are run on a virtual machine monitor (VMM) • Multiplexing resources ... VM VM VMM hardware
Software aging of VMMs • Software aging of a VMM is critical • Software aging is... • The phenomenon that software state degrades with time • E.g. exhaustion of system resources • Software aging of a VMMaffects all VMs on it • E.g. performance degradation ... VM VM VMM
Software rejuvenation of VMMs • Preventive maintenance • Performed before software aging of a VMM affects its VMs • Occasionally stops a VMM, cleans its internal state, and restarts it • Typical example: rebooting a VMM • Cleans the internal state automatically and completely • The easiest way
Drawbacks (1/2):Increasing service downtime • The VMM reboot needs: • Rebooting all OSes running on the VMs • The time tends to be long • Larger number of VMs • Longer startup time of services • A hardware reset • The BIOS power-on self test is time-consuming VM ... OS OS VMM OSshutdown VMMshutdown hardwarereset VMM boot OS boot
Drawbacks (2/2):Performance degradation • The file cache is lost by the OS reboot • OSes cannot restore performance until the file cache is re-filled • They strongly rely on the file cacheto speed up file accesses • The time tends to be long • The file cache size is increasing • Large amount of memory for a VM • Free memory as the file cache process file cache OS disk
Warm-VM reboot • Fast rejuvenation technique • Efficiently reboots only a VMM • The VMM reboot causes no OS reboot • Basic idea • Suspend all VMs before the VMM reboot • Resume them after the reboot • Challenge • How does a VMM efficiently deal with the large memory images of VMs?
On-memory suspend of VMs • Freezes the memory images of VMs on the main memory • That memory area is just reserved • The time does not depend on the memory size • Saving them into a slow disk is inefficient • ACPI S3 state for VMs • Suspend To RAM • Traditional suspend isACPI S4 state VM freeze disk main memory
On-memory resume of VMs • Unfreezes the memory images preserved on the main memory • They are reused directly as the memory of VMs • No need to read them from a slow disk • The file cache of OSes is also restored • No performance degradation VM unfreeze disk main memory
Quick reload of VMMs • Directly boots a new VMM without a hardware reset • The memory images of VMs are preserved through the VMM reboot • Software can keep track of them • A hardware reset does not guarantee this • A VMM is rebooted quickly • No overhead due toa hardware reset main memory VM new VMM preload old VMM
Comparison with other methods • Cold-VM reboot • Needs the OS reboot • Saved-VM reboot • A naive implementation of the warm-VM reboot • VMs are saved into a disk
Model for availability • Must consider the software rejuvenation of both a VMM and OSes • Warm-VM reboot • The OS rejuvenation isindependent • Cold-VM reboot • The OS rejuvenation is affectedby the VMM rejuvenation • # of the OS rejuvenationincreases OS rejuvenation VMM rejuvenation OS rejuvenation VMM rejuvenation
RootHammer • We have implemented the warm-VM reboot into Xen 3.0.0 • On-memory suspend/resume • Based on Xen's suspend/resume • Manages the mapping from theVM memory to the physical memory • Quick reload • Based on the kexec mechanism in Linux • Kexec for a VMM is included in the latest Xen • It is not for reusing the memory images VM memory physical memory
Experiments • Examine that the warm-VM reboot reduces downtime and performance degradation • Comparison • Cold-VM reboot with the OS reboot • Saved-VM reboot using Xen's suspend/resume server ... Linux Linux client VMM 2 dual-core Opteron 12 GB SDRAM 15,000 rpm SCSI disk gigabit Ethernet Linux
Performance ofon-memory suspend/resume • Suspend/resume of one VM with 11 GB of memory • Ours: 1 sec • Xen's: 280 sec • Depends on the memory size • Suspend/resume of 11 VMs • Ours: 4 sec • OS reboot: 58 sec • Depends on # of VMs
Effect of quick reload • The time of rebooting a VMM with no VMs • Warm-VM reboot • 11 sec • The time of quick reload is negligible • Cold-VM reboot • 59 sec • The time due to a hardware reset is 48 sec
Downtime of services • Warm-VM reboot • Always the same • 42 sec • Saved-VM reboot • Depends on # of VMs • 429 sec (11 VMs) • Cold-VM reboot • Affected by the service type • 157 sec (sshd) • 241 sec (JBoss)
Availability of JBoss • The warm-VM reboot achieves four 9s • Assumptions • OS rejuvenation every week • 34 sec • VMM rejuvenation every 4 weeks • In 0.5 week after the last OS rejuvenation 1 week OS rejuvenation VMM rejuvenation 0.5 week
Performance degradation • The throughput of the Apache web server • before and after the VMM reboot • Warm-VM reboot • No degradation • Cold-VM reboot • Degraded by 69%
Software rejuvenationin a cluster environment • Clustering achieves zero downtime • Multiple hosts can provide the same service • Let us consider the total throughput of all hosts in a cluster • Warm-VM reboot • (m-1)p • Cold-VM reboot • (m-1)p • (m-0.69)p for a whileafter the reboot total throughput mp (m-1)p 42 sec 241 sec t m: # of hosts p: throughput of one host
Comparison with VM migrationin a cluster environment • VM migration achieves nearly zero downtime • VMs are moved to another host • Xen's live migration, VMware's VMotion • Total throughput • Normal run • (m-1)p • One host is reserved for migration • Live migration • (m-1.12)p total throughput mp (m-1)p 42 sec 17 min t
Related work • Microreboot [Candea et al.'04] • Reboots only a part of subcomponents • The warm-VM reboot enables rebooting only a parent component (VMM for VMs) • Checkpointing/restart [Randell '75] • Saves/restores OS processes • Similar to suspend/resume of VMs • Optimizations of suspend/resume • Incremental suspend, compression of memory images
Conclusion • We proposed the warm-VM reboot • On-memory suspend/resume • Freezes/unfreezes the memory images of VMs • Quick reload • Preserves the memory images through the VMM reboot • It achieved fast rejuvenation • Downtime reduced by 83% at maximum • No performance degradation