220 likes | 417 Views
Network-aware migration control and scheduling of differentiated virtual machine workloads. Alexander Stage and Thomas Setzer Technische Universit¨at M¨unchen (TUM) Chair for Internet-based Information Systems
E N D
Network-aware migration control and scheduling of differentiated virtualmachine workloads Alexander Stage and Thomas Setzer Technische Universit¨at M¨unchen (TUM) Chair for Internet-based Information Systems ICSE Workshop on Software Engineering Challenges in Cloud Computing, Vancouver, Canada, May 2009
Introduction • Server virtualization based workload consolidation is increasingly used. • Raise server utilization levels • Ensure cost-efficient data center operations. • Unforeseen spikes or shifts in workloads require dynamic workload management to avoid server overload. • Continuously align placements of virtual machines (VMs) ----VM Live Migration
How does Live Migration Work • Phase 1: Setting • Create a TCP connection between source and destination • Copy VM’s profile to destination • Create a VM on destination Configuration Data .VHD .VSV Source Node (Host A) Destination Node (Host B) .XML .BIN Network Storage
How does Live Migration Work • Phase 2: Memory migrate • Transfer Memory to destination • Trace the difference when transferring Memory • Pause the VM on Source Node when starting last transfer Memory Content .VHD .VSV Destination Node Source Node .XML .BIN Network Storage
How does Live Migration Work • Phase 3: Status migrate • Migrate register in VM in Source Node • Starting the VM in Destination Node • Clean old VM in Source Node Running State .VHD .VSV .XML .BIN Source Node Destination Node Network Storage
Motivation • VM live migration realizes: • Dynamic resource provisioning • Load balancing • But it imposes significant overheads that need to be considered and controlled. • CPU overhead [17] • Network overhead and network topology [17] T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Black-box and gray-box strategies for virtual machine migration. In 4th USENIX Symp. on Networked Systems Design and Impl., pages 229 – 242, 2007.
Network overhead of Live Migration(1/2) • In live migration phase 2, it use iterative, bandwidth adapting pre-copy memory page transfer algorithms. • Objective: • Minimize VM downtime • Keep total migration time low • Lower the aggregated bandwidth consumption for a migration. • Non-neglectable network overhead[5] • 500 Mb/s for 10 seconds for a trivial web server VM [5]C. Clark, K. Fraser, S. H, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In Proc. of 2nd ACM/USENIX Symp. on Network Systems Design and Implementation, pages 273–286, 2005.
Network overhead of Live Migration(2/2) • Example: • Requiring the execution of 20 VM migrations within 5 minutes. • Assume each migration consumes 1 Gb/s for 20 seconds. • Sequentially scheduling them over a single 10 (1) Gb/s link saturates the link completely for 40 (400) seconds • Outcome: VMs expose sudden network load increases that would possibly lead to resource shortages.
Migration scheduling architecture • In order to deal with the network overhead of live migration, we propose migration scheduling architecture. Determines expected resource bottlenecks and low utilization levels Decide operational live migration plan to avoid migration-related SLA violations Data Center Handle unexpected situation such as sudden surges in resource demand Classify Workload Type, Predict host utilization Collect performance parameters
Workload classifier(1/2) • We identify the following main workload attributes for our classification: • Predictability: • Predictable means workload behavior can be reliably forecasted for a given period of time. • Forecasting errors are tightly bounded. • Trend: • Refers to the degree of upward or downward leading demand trends. • Periodicity: • Indicates the length (time scale) and the power of recurring patterns.
Workload classifier(2/2) • For example: • Predictive, low-variable, low-trend afflicted workloads: • Can be co-hosted more aggressively by exploiting workload complementarities. • Highly non-predictive • Require certain buffer capacity on hosts so as to guarantee overload-avoidance. • Note: The implementation of workload classifier is not the scope of this paper. • Supervise target for a period of time • Make class-assignment decision
Allocation Planner(1/2) • For predictive workload classes • Intuition: • Cohosting VMs with complementary workloads • High resource utilization can be achieved. • Method: • During runtime, use live migration to execute VM re-allocation plan to optimize the VM allocation. • Objective: • Decrease the number of required hosts. • High resource utilization can be achieved without overload.
Allocation Planner(2/2) • For non-predictive workload classes • Method: • Setting a rather conservative threshold value regarding overall host utilization to avoid overload. • If thresholds are exceeded, one or multiple VMs are selected as migration candidates • Objective: • Avoid overload is first priority.
Migration Scheduler(1/3) • Bandwidth adapting pre-copy memory page transfer algorithms : • All main memory pages are transferred • Only transferred memory pages that have been written to (dirtied) during the previous iteration. • Bandwidth usage is adaptively increased in each iteration • If the set of dirtied memory pages is sufficiently small or the upper bandwidth limit is reached then go to step 4. • Otherwise go to 2. • The last pre-copy iteration is started. • Service downtime iq = the duration of the q-th iteration of VM i bi =constant bandwidth adaptive rate of VM i mi = memory size of VM i ri = the constant memory dirtying rate of VM i
Migration Scheduler(2/3) • Currently, the bandwidth usage cannot be control during migration. • We can only control maximum bandwidth usage level. Only 2 Migration can be launched simultaneously (D is Rejected) Deadline: A: t1/t5 B: t1/t6 C: ignored/ignored D: t2/t5
Migration Scheduler(3/3) • Migration scheduler should exercise the control of migration bandwidth usage. 3 Migration can be launched simultaneously Deadline: A: t1/t5 B: t1/t6 C: ignored/ignored D: t2/t5
Schedule plan-Offline scheduling Plan(1/2) • Assumption 1: • A fixed available bandwidth on each link is reserved for VM migrations • We allow for different amounts of reservations on different links • Offline scheduling can be used for predictive VM workload clusters with periodicity or for clusters with trend. • Objective • Avoid the risk of overloading network links by migration-related bandwidth consumption
Schedule plan-Offline scheduling Plan(2/2) • Without assumption 1: • Objective • Minimize the migration-related risk of network congestions with respect to bandwidth demand fluctuations. • Since available bandwidth is not known exactly in advance • Solution: • Predict the average utilization of network links for all time slots (e.g. via the Network Weather Service [16]) • Constantly adjust the bandwidth usable for migrations to meet bandwidth utilization. • A more conservative available-bandwidth prediction is advisable [16] R. Wolski. Dynamically forecasting network performance using the network weather service. Journal of Cluster Computing, 1(119-132), 1998.
Schedule plan-Online scheduling Plan(1/2) • Characteristics • Undefined sequence. • Migrationscan be delayed as long as migration-finishing deadlinesreached. • A migration might be rejected in case it can not be executed
Schedule plan-Online scheduling Plan(2/2) • Solution: • Emergency migrations may temporarily supersede bandwidth allocations of lower priority migrations as Figure 3. • The prioritization problem in network revenue management is similar to this issue. Figure 3
Summary • In this paper we propose: • Network topology aware scheduling models for VM live migrations • Taking explicitly bandwidth requirements and the network topology into account • A scheme for classifying VM workloads. • Future work: • In co-operation with a commercial data center operator we are currently implementing the proposed architecture.
Comment • Good point to consider bandwidth management of Live migration. • But no arithmetic model for Migration schedule. • The model of prediction is simple and non-practical. • How to predict workload is my way to do deep research.