300 likes | 506 Views
Live Migration of Virtual Machines. Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen†,Eric Jul†, Christian Limpach , Ian Pratt, Andrew Warfield University of Cambridge Computer Laboratory † Department of Computer Science ,University of Copenhagen, Denmark.
E N D
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob GormHansen†,Eric Jul†, Christian Limpach, Ian Pratt, Andrew Warfield University of Cambridge Computer Laboratory † Department of Computer Science ,University of Copenhagen, Denmark USENIX NSDI ‘05
Introduction • Operating system virtualization has attracted considerable interest in recent years -In data Centers, cluster computing communities • allows many OS instances to run concurrently on a single physical machine • Migratingan entire OS and all of its applications as one unit • Compared to the process migration (residual dependencies)
Introduction • Live Migration • Without interfering the network connection • Allows a separation of concerns between the users and operator of a datacenter or cluster. • Allowing separation of hardware and software considerations
Introduction • Downtime • services are entirely unavailable • Total migration time • during which state on both machines is synchronized and which hence may affect reliability • This paper use the “pre-copy” approach to achieve live migration and target on decreasing the downtime (implemented on Xen)
Design • Network Generate an ARP reply from the migrated host, advertising that the IP has moved to a new location. • Storage Use a network-attached storage (NAS) device Do not need to migrate disk storage
Design • Memory Transfer • Push phase • Stop-and-copy phase • Pull phase • most practical solutions select one or two of the three phases • pure stop-and-copy, pure demand • This paper uses iterative push phase with a typically very short stop-and-copy phase.
Related Work • Shutdown the VM • Pre-Copy • VMware
Related Work • Post-Copy Live Migration of Virtual Machines • Michael R. Hines, Umesh Deshpande, and Kartik Gopalan Computer Science, Binghamton University (SUNY) ACM SIGPLAN/SIGOPS VEE’09
WritableWorking Sets • Some pages will seldom or never be modified and hence are good candidates for pre-copy • Some will be written often and so should best be transferred via stop-and-copy => WritableWorkingSets
Dynamic Rate-Limiting • Dynamically adapt the bandwidth limit during each pre-copying round • The administrator selects a minimum(m) and a maximum(M) bandwidth limit • The first pre-copy round transfers pages at the minimum bandwidth m
Dynamic Rate-Limiting • Dirtying rate = (the number of pages dirtied in the previous round) / (duration of the previous round) • Bandwidth rate for next round = Dirtying rate + 50 Mbits/sec • Stop pre-copy when • Calculated rate > M • Less than 256KB remains to be tranferred
Some implementation issues • Rapid Page Dirtying • Do not need to always transfer hot pages • Freeing Page Cache Pages • In the first round • Stunning Rogue Processes • Limit each process to 40 write faults each time
Evaluation • Dell PE-2650 server-class machines • dual Xeon 2GHz CPUs • 2GB memory • connected via Gigabit Ethernet • Storage: iSCSIprotocol NAS • XenLinux 2.4.27
a. SimpleWebServer • Apache 1.3 web server • Continuously serving a single 512KB file • memory allocation: 800MB • Initially rate limited to 100Mbit/sec • 776MB memory to be transferred in the first round • 165ms outage
b.ComplexWebWorkload:SPECweb99 • memory allocation: 800MB • 30% require dynamic content generation • 16% are HTTP POST operations • 0.5% execute a CGI script • The server generates access and POST logs • 210ms outage
c. Low-Latency Server: Quake 3 • a multiplayer on-line game server • a virtual machine with 64MB of memory • Six players joined the game and started to play within a shared arena • transfers so little data (148KB) in the last round • Downtime: 60ms
d. A DiabolicalWorkload: MMuncher • a virtual machine is writing to memory faster than can be transferred • Memory: 512MB • a simple C program that writes constantly to a 256MB • Downtime: 3.5 seconds
Conclusion • A pre-copy live migration method on Xen • Concern about WWS • Dynamic network-bandwidth adaption • realistic server workloads such as SPECweb99 can be migrated with just 210ms downtime • a Quake3 game server is migrated with an imperceptible 60ms outage