240 likes | 469 Views
Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke. Live Wide-Area Migration of Virtual Machines Including Local Persistent State. Differences in Contributions. The first paper discussed transferring run-time memory of a VM in a LAN. Cool.
E N D
Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke Live Wide-Area Migration of Virtual Machines Including Local Persistent State
Differences in Contributions • The first paper discussed transferring run-time memory of a VM in a LAN. • Cool. • This paper expands on that to transfer the VM’s image, its persistent state and on-going network connections over a WAN as well. • By combining pre-copying, write-throttling and a block-driver, we can achieve this.
Introduction • In this project, the authors want to extend live VM migration to include: • Persistent state (file systems used by the VM) • Open network connections • Why? • Many apps running on a VM need that storage, and NAS systems may not be available in the new location. • Moving across a WAN will almost certainly involve an IP change, and we don’t want to (overly) disrupt TCP connections. • Contribution • A system enables live migration of VMs that use local storage and open network connections without severely disrupting their live services.
Highlights • Some highlights of this work: • Built upon the Xen Live Migration facility as part of XenoServer. • Enables: • Live migration • Consistency • Minimal Service Disruption • Transparency • Utilizes: • Pre-copying, write-throttling and IP tunneling.
System Design - Environment • Both the source and destination run Xen, with the VM running XenLinux. • Uses blocktap to export block devices into the migrated VM. • Block devices are file-backed meaning the contents of the block device are stored in an ordinary file on the file system of the VM.
System Design - Architecture • The initialization stage starts things off by prepping the migration. • The bulk transfer stage pre-copies the disk image of the VM to the destination while the VM continues to run. • Xen transfer is then initiated, which performs incremental migration, again without stopping the VM. • While the transfers are occurring, all disk writes are intercepted as deltas that will be forwarded to the destination. • Deltas include the data written, the location written and the size of the data. • The deltas are recorded into a queue that will be transferred later. • If write activity is too high and too many deltas are being generated, write-throttling is engaged to slow down the VM. • In parallel with Xen transfer, the deltas are applied to the destination VM. • At some point, the source VM is paused, the destination is started and a temporary network redirect is created to handle the potential IP changes.
Implementation - Initialization • Authentication, authorization and access control are handled by XenoServer. • The migration client forks, creating a listener process that signals the block-driver to enter record mode. • In record mode, the driver copies the writes to the listener process, which transfers them to the destination. • The other half of the migration client begins the bulk transfer. • At the destination, there is also a fork in the daemon. One receives the bulk transfer, the other receives the deltas.
Implementation – Bulk Xfer • The VM’s disk image is transferred from the source to the destination. • XenoServers platform supports copy-on-write along with immutable template disk images, so we can just transfer the changes, rather than the whole image.
Implentation – Xen Migration • The system here relies on the built-in migration mechanism of Xen. • Xen logs dirty memory pages and copies them to the destination without stopping the source VM. • Eventually, we will have to pause the source and copy the remaining memory pages. • Then we start up the migrated VM.
Implementation - Intercepting • The blkfront device driver communicates with the dedicated storage VM via ring buffer. • The blktap framework intercepts requests, but does it in user space. • Once a disk request makes it to the backend, it is both committed to the disk and sent to the migration client. • The client then packages the write up as a delta.
Implementation - Application • After the bulk transfer, and in parallel with the Xen transfer, the deltas are transferred and applied to the migrated VM by the migration daemon in the storage VM at the destination. • If the delta queue becomes empty and the Xen migration is finished, I/O requests are put on hold until the application of the current crop of deltas is finished. • The authors found that delta application was normally finished before Xen migration, adding zero time to the overall migration.
Implementation – Write Throttling • If the VM attempts to complete more writes than a given threshold value, future write attempts are delayed by the block driver. • This process repeats, with the delay and threshold doubling each time. • Experimentally, 16384 is a suitable threshold with a delay of 2048 μs also being good. • Enforcement is separated from policy for extensibility.
Implementation – WAN Redirection • If the IP of the VM changes, we use IP tunneling and Dynamic DNS to prevent dropped network connections. • Just before the source VM is paused, an IP tunnel is created from the source to destination using iproute2. • Once the destination VM is capable of responding to requests at its new IP, Dynamic DNS forwards the requests to the new IP. • Packets that arrive during the final stage of migration are simply dropped. • Once no connections exist that use the old IP, the tunnel is torn down. • Practically, this works because: • The source server only needs to cooperate for a short time, most network connections are short-lived and if nothing else, it’s no worse than what you get if the VM doesn’t even try.
Evaluation - Metrics • Want to evaluate the disruption of the system, as perceived by users. • Spoiler: Results look good. • Want to show the system handles diabolical workloads, defined in this paper as being heavy disk accessors, rather than heavy memory accessors. • Downtime: Time between pausing the VM on the source and resuming on the destination. • Disruption Time: Time during which clients observe a reduction in service responsiveness. • Additional Disruption: Difference between disruption time and downtime. • Migration time: Time from migration request to running VM at destination. • Number of Deltas and Delta Rate: How many file system changes and how often they occur.
Eval – Workload Overview • Web server serving static content, serving dynamic web application and video streaming. • Chosen for realistic usage scenarios and because they neatly trifurcate the spectrum of disk I/O patterns: • Dynamic app generates lots of bursty writes • Static workload generates a medium amount of constant writes • Streaming video causes few writes, but is very sensitive to disruption.
Eval – Experimental Setup • Three hosts • Dual Xeon 3.2 GHz, 4 GB DDR RAM, mirrored RAID array of U320 SCSI disks. • The migrated VM was provided with 512 MB RAM and a single CPU. • All hosts were connected by a 100 Mbps switched Ethernet networks. • The migrated VM was running Debian on a 1GB ext3 disk. • Host C is the client. • To emulate WAN transfers, traffic shaping was used to limit the bandwidth to 5Mbps with 100 ms of latency. • Representative of host in London and U.S. east coast.
Results – LAN Migration • Measured disruption is 1.04 seconds and “practically unnoticeable by a human user.” • Few deltas by the log files for web server. • If you’re shooting for a “5 9’s” uptime, you still get 289 migrations a year.
Results – LAN Migration • phpBB with a randomly posting script. • Disruption is 3.09 seconds due to more deltas. • HTTP throughput is almost not affected and total migration time is shorter. • 5 9’s uptime still lets us migrate 98 migrations.
Results – LAN Migration • Streamed a large video file, viewed by a human on host C. • Disruption is 3.04 seconds and alleviated by the buffer of the video player. • No packets are lost, but there is lots of retransmission.
Comparison to Freeze-and-Copy • Clearly, freeze and copy provides a much worse disruption than the live migration.
Results – WAN Migration Longer migration time leads to more deltas. The tunneling let the connections persist. 68 total seconds of disruption, which is a lot, but much less than freeze-and-copy.
Results – Diabolical Workload Ran the Bonnie benchmark as a diabolical process, generating lots of disk writes. Needed to throttle twice. Initially, the bulk transfer is severely impeded and fixed. The overall migration takes 811 s, and without throttling, the transfer would have taken 3 days.
Results – I/O Overhead • The overhead of intercepting deltas is pretty low and only noticeable during the migration.
Conclusions • Demonstrated a VM migration scheme that includes persistent state, maintains open network connections and therefore works over a WAN without major disruption. • It can handle high I/O workloads too. • Works much better than freeze-and-copy. • Future work includes batching of deltas, data compression and “better support for sparse files.”