270 likes | 455 Views
Sherif Akoush , Ripduman Sohan , Andrew W.Moore , Andy Hopper University of Cambridge. Predicting The Performance Of Virtual Machine Migration. Presented by : Eli Nazarov. Agenda. Introduction. How to migrate? Defining migration performance. Performance prediction.
E N D
SherifAkoush, RipdumanSohan, Andrew W.Moore, Andy Hopper University of Cambridge Predicting The Performance Of Virtual Machine Migration Presented by : Eli Nazarov
Agenda • Introduction. • How to migrate? • Defining migration performance. • Performance prediction. • The AVG & HIST models. • Evaluation. • Conclusions.
Why performance prediction matters? • Provision and control computing capacity. • Guarantee performance levels . • Efficient management. • Better VM placement. • Better resource utilization (e.g. load balancing).
How to migrate? • Stop-and-Copy • Minimizes total migration time. • Highest downtime. • On-Demand. • Short downtime. • Very high total migration time.
Pre-Copy migration • Pre-Copy migration involves 6 steps: • Initialization • Pre-select a target for migration. • Reservation • Reserve resources on the destination host. • Iterative Pre-Copy • First Iteration : Send all RAM. • Each iteration : Send modified pages. • Stop-and-Copy • Stop VM for final transfer. • Commitment • Destination host acknowledges that the copy finished correctly. • Activation • Re-attachment of resources to VM on the destination host. Pre-copy phase Copy phase
Xen Stop Conditions • Less then 50 pages were dirtied during the last pre-copy iteration. • Guarantees short downtime. • 29 pre-copy iteration have been carried out. • Already copied more then 3*|VM|. • At iteration N-1 we copied 3*|VM|-1page • Forces Stop-and-Copy stage.
How To Predict? • Calculate Bounds.
Bounds are not enough! • Don’t give accurate prediction. • Reason: Significant differences in lower and upper bounds due to link speed and VM size correlation. • Example: For VM Size=1,024 MB • MT =Total Migration Time, DT=Total Downtime, LB=Lower Bound, UB=Upper Bound • For big VM memory sizes even larger differences. • We need something more accurate.
Parameters affecting migration • Migration link bandwidth. • Higher speed links allow faster transfers. • Pre and Post migration overheads. • Operations that aren’t part of the actual transfer. • Examples: • Initializing container in destination host . • Reattaching device drivers to the new VM. • etc. • Example: • 10 Gbps, VM size = 512MB Pre-overhead = 77%
Parameters affecting migration (cont.) • Page dirty rate. • The rate at which memory pages in VM are modified. • Affects the number of pages transferred in each pre-copy integration. • Page dirty rate and performance relation is not linear • Reason: Link speed.
Page dirty rate and link speed • Downtime at low page dirty rate is almost constant and close to lower bound. • Downtime increases to upper bound when page dirty rate is high (reaches link capacity). 10Gbps – Total downtime
Page dirty rate and link speed (cont.) • Total migration time increases with page dirty rate. • Total migration time goes back to lower bound for extremely high page dirty rate. Back to pure Stop-and-Copy. 10Gbps – Total migration time 100Mbps – Total migration time
What's next? • Prediction using all parameters affecting migration. • Link speed. • Page dirty rate. • VM memory size. • Overheads. AVG - Average Page Dirty Rate HIST – History Based Page Dirty
The AVG model • Based on the migration logic. • Assumes constant or average page dirty rate. • Useful when the dirty page rate is stable. • Follow the core functionality of migration in Xen.
The AVG model (cont.) • Input parameters: • Link Speed. • Page Dirty Rate. • Analytically determinable. • Pre\Post overheads. • Time spent during actual transfer – Time to migrate idle VM • VM Size. • Xen functionality: • sim_clean(): returns the set of dirty pages + sets state to “all clean”. • sim_peek(): returns bitmap of dirty pages (no state change).
Algorithm - the AVG model • Each Pre-Copy phase: • Get dirty bitmap – sim_peek(). • Skip the pages re-dirtied in this iteration • Collect at most 1024 pages – batch. • migration_time += • if (last_iteration) • downtime_time += • Clean pages status – sim_clean(). • Calculate the total times: • total_migration_time = migration_time + pre_overheads + post_overheads. • total_downtime = downtime + post_overheads.
The HIST model • Used in cases where the dirty page rate is a function of time. • Depends on the history log of page dirty rate.
The HIST model (cont.) • Given the start time of migration – t • Predict migration times based on: t+1,t+2, …, t+N • Changed sim_clean() and sim_peek() to return #dirty pages at the above points in time for log. • Use AVG algorithm with these function. • Observation: • For deterministic processes the set of dirtied pages at any point in time will be approximately the same as for previous runs of the same workload running in a similar environment.
Evaluation • Test-bed: • Xenserver 5.5.0 (Xen 3.3.1) on 3 servers. • 1 pool master, 2 hosts for migration. • Each server: 2 Intel® Xeon™ 2.13 GHZ, 6GB DDR3. • SAN – IBM eserverxSeries 336. • 2 GB DIMM. • Ultra320 SCSI. • Ubuntu 2.6.27-7 kernel. • Compared to: • Actual migration using 2 SolarFlare10Gbps NICs.
Evaluation (Cont.) • Page Modification Micro-Benchmark • Can be used both for AVG & HIST. • Deterministic application. • Writes to memory pages at fixed rates. • High resolution of page modification • Up to pages/sec. • Over 25,000 live migrations.
Evaluation (cont.) - Results HIST v.s Real migration AVG v.s Real migration
Results (Cont.) - Results • For |VM|=1024MB , LinkSpeed=10Gbps: • HIST mean deviation from the measurements : • 3.3% - total migration time. • 6.2% - total downtime. • AVG mean deviation from the measurements: • 2.6% - total migration time. • 3.3% - total downtime.
Evaluation(cont.) – Industry workloads • Comparing against a set of industry-standard workloads. • SPEC CPU • For CPU bounds workloads. • SPECweb • WebServer workloads. • SPECsfs • I/O, MapReduce & non-interactive workloads.
Industry workloads - Results • MT =Total Migration Time, DT=Total Downtime, A=Actual Measurements P=HIST Prediction
Comments • Presented an accurate model for prediction. • Performed a large scale evaluation. • Very specific to Xen implementation. • Didn’t perform evaluation comparing to other prediction methods. • Didn’t state how to predict with bounds.