Predicting The Performance Of Virtual Machine Migration

SherifAkoush, RipdumanSohan, Andrew W.Moore, Andy Hopper University of Cambridge Predicting The Performance Of Virtual Machine Migration Presented by : Eli Nazarov

Agenda • Introduction. • How to migrate? • Defining migration performance. • Performance prediction. • The AVG & HIST models. • Evaluation. • Conclusions.

Why performance prediction matters? • Provision and control computing capacity. • Guarantee performance levels . • Efficient management. • Better VM placement. • Better resource utilization (e.g. load balancing).

How to migrate? • Stop-and-Copy • Minimizes total migration time.  • Highest downtime.  • On-Demand. • Short downtime.  • Very high total migration time. 

Pre-Copy migration • Pre-Copy migration involves 6 steps: • Initialization • Pre-select a target for migration. • Reservation • Reserve resources on the destination host. • Iterative Pre-Copy • First Iteration : Send all RAM. • Each iteration : Send modified pages. • Stop-and-Copy • Stop VM for final transfer. • Commitment • Destination host acknowledges that the copy finished correctly. • Activation • Re-attachment of resources to VM on the destination host. Pre-copy phase Copy phase

Xen Stop Conditions • Less then 50 pages were dirtied during the last pre-copy iteration. • Guarantees short downtime. • 29 pre-copy iteration have been carried out. • Already copied more then 3*|VM|. • At iteration N-1 we copied 3*|VM|-1page • Forces Stop-and-Copy stage.

Migration & Down times

How To Predict? • Calculate Bounds.

Bounds are not enough! • Don’t give accurate prediction. • Reason: Significant differences in lower and upper bounds due to link speed and VM size correlation. • Example: For VM Size=1,024 MB • MT =Total Migration Time, DT=Total Downtime, LB=Lower Bound, UB=Upper Bound • For big VM memory sizes even larger differences. • We need something more accurate.

Parameters affecting migration • Migration link bandwidth. • Higher speed links allow faster transfers. • Pre and Post migration overheads. • Operations that aren’t part of the actual transfer. • Examples: • Initializing container in destination host . • Reattaching device drivers to the new VM. • etc. • Example: • 10 Gbps, VM size = 512MB Pre-overhead = 77%

Parameters affecting migration (cont.) • Page dirty rate. • The rate at which memory pages in VM are modified. • Affects the number of pages transferred in each pre-copy integration. • Page dirty rate and performance relation is not linear • Reason: Link speed.

Page dirty rate and link speed • Downtime at low page dirty rate is almost constant and close to lower bound. • Downtime increases to upper bound when page dirty rate is high (reaches link capacity). 10Gbps – Total downtime

Page dirty rate and link speed (cont.) • Total migration time increases with page dirty rate. • Total migration time goes back to lower bound for extremely high page dirty rate. Back to pure Stop-and-Copy. 10Gbps – Total migration time 100Mbps – Total migration time

What's next? • Prediction using all parameters affecting migration. • Link speed. • Page dirty rate. • VM memory size. • Overheads. AVG - Average Page Dirty Rate HIST – History Based Page Dirty

The AVG model • Based on the migration logic. • Assumes constant or average page dirty rate. • Useful when the dirty page rate is stable. • Follow the core functionality of migration in Xen.

The AVG model (cont.) • Input parameters: • Link Speed. • Page Dirty Rate. • Analytically determinable. • Pre\Post overheads. • Time spent during actual transfer – Time to migrate idle VM • VM Size. • Xen functionality: • sim_clean(): returns the set of dirty pages + sets state to “all clean”. • sim_peek(): returns bitmap of dirty pages (no state change).

Algorithm - the AVG model • Each Pre-Copy phase: • Get dirty bitmap – sim_peek(). • Skip the pages re-dirtied in this iteration • Collect at most 1024 pages – batch. • migration_time += • if (last_iteration) • downtime_time += • Clean pages status – sim_clean(). • Calculate the total times: • total_migration_time = migration_time + pre_overheads + post_overheads. • total_downtime = downtime + post_overheads.

The HIST model • Used in cases where the dirty page rate is a function of time. • Depends on the history log of page dirty rate.

The HIST model (cont.) • Given the start time of migration – t • Predict migration times based on: t+1,t+2, …, t+N • Changed sim_clean() and sim_peek() to return #dirty pages at the above points in time for log. • Use AVG algorithm with these function. • Observation: • For deterministic processes the set of dirtied pages at any point in time will be approximately the same as for previous runs of the same workload running in a similar environment.

Evaluation • Test-bed: • Xenserver 5.5.0 (Xen 3.3.1) on 3 servers. • 1 pool master, 2 hosts for migration. • Each server: 2 Intel® Xeon™ 2.13 GHZ, 6GB DDR3. • SAN – IBM eserverxSeries 336. • 2 GB DIMM. • Ultra320 SCSI. • Ubuntu 2.6.27-7 kernel. • Compared to: • Actual migration using 2 SolarFlare10Gbps NICs.

Evaluation (Cont.) • Page Modification Micro-Benchmark • Can be used both for AVG & HIST. • Deterministic application. • Writes to memory pages at fixed rates. • High resolution of page modification • Up to pages/sec. • Over 25,000 live migrations.

Evaluation (cont.) - Results HIST v.s Real migration AVG v.s Real migration

Results (Cont.) - Results • For |VM|=1024MB , LinkSpeed=10Gbps: • HIST mean deviation from the measurements : • 3.3% - total migration time. • 6.2% - total downtime. • AVG mean deviation from the measurements: • 2.6% - total migration time. • 3.3% - total downtime.

Evaluation(cont.) – Industry workloads • Comparing against a set of industry-standard workloads. • SPEC CPU • For CPU bounds workloads. • SPECweb • WebServer workloads. • SPECsfs • I/O, MapReduce & non-interactive workloads.

Industry workloads - Results • MT =Total Migration Time, DT=Total Downtime, A=Actual Measurements P=HIST Prediction

Comments  • Presented an accurate model for prediction. • Performed a large scale evaluation.  • Very specific to Xen implementation. • Didn’t perform evaluation comparing to other prediction methods. • Didn’t state how to predict with bounds.

Questions? ?

Predicting The Performance Of Virtual Machine Migration

Predicting The Performance Of Virtual Machine Migration

Presentation Transcript

Predicting Performance

Live Migration of Virtual Machines

Live Migration of Virtual Machines

The Virtual Machine

Virtual Machine

Predicting Ink Performance

The Execution Migration Machine

Predicting Parallel Performance

Predicting performance

Virtual Machine

Virtual machine

The Java Virtual Machine

Testing Virtual Machine Performance Running ATLAS Software

Xen Virtual Machine Monitor Performance Isolation

Predicting Performance

Virtual Machine

The Java Virtual Machine

Virtual Machine

Virtual Machine

Predicting performance