440 likes | 703 Views
Black-box and Gray-box Strategies for Virtual Machine Migration. Timothy Wood, Prashant Shenoy , Arun Venkataramani , and Mazin Yousif † Univ. of Massachusetts Amherst † Intel 4th USENIX Symposium on Networked Systems Design & Implementation (NSDI 2007). Introduction.
E N D
Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, PrashantShenoy, ArunVenkataramani, and MazinYousif † Univ. of Massachusetts Amherst †Intel 4th USENIX Symposium on Networked Systems Design & Implementation (NSDI 2007)
Introduction • Operate application in data center. • Effective management of data center resources while meeting SLAs • Virtualization • Benefit of Virtualization • Application isolation • Server consolidation(multiplexing) • Handle workload dynamics
Motivation • Efficient data center resource management • Live Migration • However, detecting workload hotspots and initiating a migration is currently handled manually • Lacks the agility to respond to sudden workload changes • Need consider multiple resource • CPU, network, and memory
Solution • Automated black-box and gray-box strategies for virtual machine migration (Sandpiper) • Monitoring system resource usage • Hotspot detection • Determining a new mapping • Initiating the necessary migrations
Determine: What virtual servers should migrate Where to move them How much of a resource to allocate the virtual servers after migration The Sandpiper Architecture Monitors usage profiles to detect hotspots. Hotspot: any resource exceeds a threshold(or SLA violation) for a sustain period Construct resource usage profiles for each virtual server (Predict PM workload) Gathering resource usage statistics on that server Gathers processor, network and memory swap statistics for each VM Implements a daemon to gather OS-level statistics and application logs
Black-box monitoring(1/4) • VM workload usages is inferred solely from external observations. • From Domain-0. • Monitoring parameter: • CPU usage • Network bandwidth • Memory swap rate • Monitoring interval
Black-box monitoring(2/4)-CPU monitoring • VM CPU usage can be determined by tracking scheduling events in the hypervisor. • Does not include VM’s disk IO and network CPU overhead. • These kinds of overhead is count on Domain-0 • Each VM is then charged: • domain-0’s CPU usage*(VM IO request/ total IO requests) • Assumption: the monitoring engine and the nucleus overhead is negligible
Black-box monitoring(3/4)-Network monitoring • Background: • Domain-0 in Xenimplements the network interface driver • VMs access the driver via clean device abstractions(virtual firewall-router (VFR) interface) • Monitoring engine can use the Linux /proc interface VNIC’s usage • /proc/net/dev
Black-box monitoring(4/4)-Memory monitoring • Challenge: • Domain-0 cannot directly monitor each VM’s actual memory usage/utilization. • Only know the amount of memory assigned to the VM. • Solution: • Observing swap activity in Domain-0 can infer the working set sizes.[11] [11] S. Jones, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. Geiger: Monitoring the buffer cache in a virtual machine environment. In Proc. ASPLOS’06, pages 13–23, October 2006.
Gray-box monitoring • Motivation: • Black-box monitoring is not feasible to “peek inside” a VM to gather usage statistics. • Solution: • Install a light-weight monitoring daemon inside each virtual server • Use /proc interface to gather OS-level statistics • CPU, network, memory • Application-level statistics • Daemon get statistics from function provided by application itself • E.g. web/database server: request rate, request drop rate, service time
Profile Generation(1/2) • Profile: a compact description of that server’s resource usage over a sliding time window W. • Profile content: • Blackbox parameter: • CPU utilization, network bandwidth utilization, and memory swap rate • Graybox parameter: • memory utilization, service time, request drop rate and incoming request rate. (assumption: web server-apache)
Profile Generation(2/2) • Profile type: • Distribution profile: • The probability distribution of the resource usage over the window W. • Time series profile: • The temporal fluctuations and it is simply a list of all reported observations within the window W.
Hotspot detection • Goal: • Signaling a need for VM migration whenever SLA violations are detected. • A hotspot is flagged only ifthresholdsorSLAsare exceeded for a sustained time. • at least k/nmost recent observations and the next predicted valueexceed a threshold. • Use time series profile • Formula: (auto-regressive family of predictors-AR(1).)
Resource Provisioning • Goal: • Ensures that the SLAs are not violated even in the presence of peak workloads. • Estimate the peak CPU, network and memory requirement of each overloaded VM • Black-box provisioning • Gray-box provisioning
Black-box provisioning(1/3) • Estimation of peak CPU&Network bandwidth needs: • Distribution profile • Use historical data to predict the peak. • Challenge: Estimation error!! • Background: • Both the CPU scheduler and the network packet scheduler in Xen are work-conserving.
Black-box provisioning(2/3) • Estimation error: • Example: • Two virtual machines that are assigned CPU weights of 1:1(50% of each) • Assume that VM1is overloaded and requires 70% of the CPU to meet its peak needs.
Black-box provisioning(3/3) • Solution of estimation error: • adds a constant Δ to scale up this estimate. • Estimation of peak memory needs: • If swap activity exceeds the threshold. • Then the current allocation is deemed insufficient and is increased by a constant amount Δm
Gray-box provisioning(1/3) • The gray-box approach can access to application-level logs. • Ability to estimate the peak resource needs of the application even when the resource is fully utilized. • Estimating peak CPU needs: • An application modelis necessary to estimate the peak CPU needs.
Gray-box provisioning(2/3) • Estimating peak CPU needs(cont.): • Applications such as web and database servers can be modeled as G/G/1 queuing systems[23]. • G/G/1 queuing system behavior[13]: • = mean service time (obtain from server log) • d = mean response time of request(obtain by SLA) • = request arrival rate • = variance of inter-arrival rate(obtain from server log) • = variance of service time(obtain from server log) [23] B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning for multi-tier internet applications. In Proc. ICAC ’05, June 2005. [13] L. Kleinrock. Queueing Systems, Volume 2: Computer Applications. John Wiley and Sons, Inc., 1976.
Gray-box provisioning(3/3) • Estimating peak CPU needs(cont.): • We can map the current CPU usage with, then the peak CPU usage can be calculated: • Estimating peak network bandwidth • b = mean requested file size
Hotspot mitigation(1/3) • Hotspot mitigation alg: • Goal: • Determine which VM should be migrateto where to dissipate. • Challenge: • NPHard--- multi-dimensional bin packing problem • Bin=physical server, dimension=resource constraints • Solution: • A heuristic which solve: • Which overloaded VMs to migrate • Migrate to where such that migration overhead is minimized. • Migration overhead can not be neglect
Hotspot mitigation(2/3) • Hotspot mitigation alg(cont.): • Intuition: • Move load from the most overloaded servers to the least-loaded servers, • minimize data copying incurred during migration • Volume: the degree of load along multiple dimensions in a unified fashion. • where cpu, net and mem are the corresponding utilizations of that resource for the virtual or physical server
Hotspot mitigation(3/3) • Hotspot mitigation alg(cont.): • volume-to-size ratio (VSR): • Volume/Size(Size=the memory size of the VM) • Migration decision: • Move highest VSR VM from the highest volume serverand determines if it can be housed on the least volume physical server. • Swap decision(only consider 2-way swap): • Activate when simple migration cannot solve hotspot. • Swap the highest VSR VM on the highest volume hotspot server with k lowest VSR VMs in lowest volume server • If a swap cannot be found, the next least loaded server is considered • Note: a swap may require a third server(RAM issue)
Implementation • Virtualization platform: • Xen • Sandpiper Control plane: • Run on the control node(Python) • Profiling Engine: • Use past 200 measurement to generate profile • Hotspot trigger: • 3/5 (k/n) past reading+next predicted over threshold • Default threshold: • 75% • Monitoring Engine: • Gray-box monitoring daemon: • Linux OS daemon, Apache module(service time, request rate, drop rate, file size)
Evaluation Environment • Data center: • 20 server(2.4Ghz pentium-4 servers) • Connected with gigabit ethernet • At least 1GB ram • OS • Linux 2.6.16+Xen 3.0.2-3 • Workload generator • A cluster of Pentium-3 Linux servers
Experiment 1-Migration Effectiveness • Experiment 1 uses 3 physical servers and 5 VMs with memory allocations as following. • All VMs run Apache serving dynamic PHP web pages. • Use httperf to inject a workload
Experiment 1-Migration Effectiveness(cont.) t=362,Hotspot detected, VM4 has 2-nd highest VSR (no PM has enough capacity to host VM3) PM1 has lowest volume t=166,Hotspot detected, VM1 has highest VSR PM3 has lowest volume In final phase VM1 and VM5 the same Volume But VM5 use smaller memory PM2 has lowest volume VM3
Experiment 2- Virtual Machine Swaps • Experiment setting: • As before, clients use httperf to request dynamic PHP pages.
Experiment 2- Virtual Machine Swaps(cont.) Hotspot detected on PM1. The only viable solution is to swap VM2 with VM4. (3 party swap) VM4 use smallest memory, so it is migrated twice. Migration of VM2 is completed, VM4 start to be migrated to PM1. Migration overhead Migration of VM4 is completed, VM2 start to be migrated to PM2.
Experiment 3- Mixed resource workloads • Experiment setting: • VM2 is database that stores its table in memory • PM2 has more physical memory
Experiment 3- Mixed resource workloads(cont.) • PM1 has a network hotspot and PM2 has a CPU hotspot • Sandpiper swaps a network intensive VM for a CPU-intensive VM at t=130
Experiment 3- Mixed resource workloads(cont.) • Sandpiper responds by increasing the RAM allocation in steps of 32MB every time swapping is observed; • When no additional RAM is available, the VM is swapped to the second physical server at t=430. • Swap two Network-intensive VM(VM1 and VM2)
Experiment 4- Gray v. Black: Memory Allocation • Goal: • Compare the effectiveness of the black- and graybox approaches in mitigating memory hotspots • Using the SPECjbb2005 benchmark generate memory usage. • Settings:
Experiment 4- Gray v.s. Black: Memory Allocation(cont.) • Experiment Result: • Observation: • The gray-box system can reduce or eliminate swapping without significant overprovisioning of memory.
Experiment 4- Gray v.s. Black: Apache Performance • Settings: • We use httperf to generate requests for CPU intensive PHP scripts on all VMs.
Experiment 4- Gray v.s. Black: Apache Performance • Black-box strategy error guess: 1 2 3 4
Experiment 4- Gray v.s. Black: Apache Performance • Compare Gray-box strategy with Black-box strategy: • Gray-box strategy can migrate VM3 to PM2 and VM1 to PM3 concurrently
Experiment 5-Prototype Data Center Evaluation • Data Center environment • 16 servers that run a total of 35 VMs. • 1 additional server runs the control plane • 1 additional server is reserved as a scratch node for swaps. • Settings: • Six physical servers running a total of 14 VMs to be overloaded • four servers see a CPU hotspot and two see a network hotspot
Experiment 5-Prototype Data Center Evaluation Migration overhead • Result: • Sandpiper eliminates hotspots on all six servers by interval 60.
Sandpiper overhead and scalability • Sandpiper’s CPU and network overhead: • depends on the number of PMs and VMs in the data center. • Overhead of Graybox strategy may affected by the size of application-level statistics gathered
Sandpiper overhead and scalability(cont.) • Nucleus overhead: • Network: • Each report uses only 288 bytes per VM • The resulting overhead on a gigabit LAN is negligible • CPU usage: • Compare the performance of a CPU benchmark with and without our resource monitors running. • On a single physical server running 24 concurrent VMs, • Nucleus overheads reduce the CPU benchmark by approximately 1%.
Sandpiper overhead and scalability(cont.) • Control Plane Scalability: • Source of computation complexity • Computation of a new mapping of virtual machines to physical servers after detecting a hotspot
Conclusion&future work • In this paper, we proposed Sandpiper, a automatic system which can: • monitoring and detecting hotspots • determining a new mapping of physical to virtual resources • initiating the necessary migrations • We discussed a blackbox strategy and graybox strategy. • Evaluation showed we can bring rapid hotspot elimination in data center environments. • Future work: • Support replicated services • automatically determining whether to migrate a VM or to spawn a replica.
Comment • Advantage: • Good point to separate the monitoring strategy in blackbox and graybox. • Sandpiper’s architecture and strategy may fit our “Plan A” • Shortage: • The relationship of CPU utilization and request rate may not be linear • The hotspot mitigation algorithm only consider average the workload between physical machine • Should consider how to make PM get highest utilization without hotspot