140 likes | 296 Views
Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud. Can Hankendi Ayse K. Coskun Boston University Electrical and Computer Engineering Department IGCC’13, Arlington, VA. This work has been partially funded by VMware, Inc. and MGHPCC seed funds.
E N D
Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud Can HankendiAyse K. Coskun Boston University Electrical and Computer Engineering Department IGCC’13, Arlington, VA This work has been partially funded by VMware, Inc. and MGHPCC seed funds.
Energy-Efficiency in Computing Clusters • Energy-related costs are among the biggest contributors to the total cost of ownership. • Consolidating multiple workloads on the same physical node improves energy efficiency.
Multi-threaded Applications in the Cloud Virtualization Layer Virtualization Layer • HPC applications are expected to shift towards cloud resources. • Resource allocation decisions significantly affect the energy efficiency of server nodes. • Non-HPC: • Low utilization • High VM density • HPC: • High utilization • Scalability matters vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU VM VM VM
Outline • Background • Analyses on Performance Isolation • Autonomous Resource Allocation Technique • Results
Background Virtualization Layer Virtualization Layer Virtualization Layer Virtualization Layer Cluster-level Management • VM migration techniques [Bobroff et al. INM’07, Beloglazov et al. MGC’10, VMware DRS] • Infrastructure scale-up/down [Bonvin et al. CCGrid’11] Node-level Management App-1 • Finding best thread mixes • [Frachtenberg et al. TPDS’05, McGregor et al. IPDPS’05] App-0 OS • Finding best application-pairs (e.g. pairing high IPC applications with low IPC ones) • [Dhiman et al. ISLPED’09, Bhadauria et al. ICS’10]
Virtualized System Setup • 12-core AMD MagnyCours Server • 2x 6-core dies attached side by side in the same package • Private L1 and L2-caches for each core • 6 MB shared L3-cache for each 6-core die • Virtualized through VMware vSphere5.1 ESXi hypervisor • 2 Virtual Machines with Ubuntu Server Guest OS
Power and Performance Monitoring • Performance monitoring through virtualized performance counters • Performance counter multiplexing • 100ms sampling period • Guest-OS-level monitoring • Selected performance counters: • CPU cycles, retired instructions, L2-cache misses, L3-cache misses, bus utilization, stall cycles, branch misspredictions, floating-point instructions • System-level power measurement using Wattsup power meter Logger
Performance Isolation on Virtual Systems w/o binding w/ binding • Consolidating multiple workloads can degrade performance due to resource contention • CPU binding and NUMA balancing can mitigate the performance variation Thread-0 Thread-1 Native Native VM w/ NUMA Bal. VM w/o NUMA Bal. CPU CPU Mem Mem VM w/ NUMA Bal.
CPU Resources vs. Performance • Performance benefits from increasing CPU resources vary significantly across PARSEC benchmarks
User-defined Constraints ESXi 5.1 • Users can define and put constraints on the allocation decisions (e.g., minimum throughput, fairness) • Resource allocation routine is designed as a closed-loop controller to satisfy the constraints User-defined Constraints Compute weights Check constraint Monitor Application Adjust CPU Limits
Runtime Behavior w/ Constraints Benchmarks blackscholes dedup Benchmarks blackscholes dedup Benchmarks blackscholes dedup Benchmarks blackscholes dedup Minimum throughput constraint Weight (w) 0.50 0.50 Weight (w) 0.63 0.37 Weight (w) 0.62 0.38 Weight (w) 0.66 0.34 Resource (MHz) 11970 11970 Resource (MHz) 14963 8977 Resource (MHz) 15648 8292 Resource (MHz) 15648-1995 8292+1995
Overall Results • For randomly generated 50 workload sets, the proposed technique together with MPC*Utilization application selection policy improves the energy efficiency by 17% on average.
Increasing Number of VMs • Energy efficiency improvements are 13% lower for higher number of VMs (i.e., 6 VMs) running multi-threaded applications • Application set (12 apps): 2x blackscholes, 2x dedup, 2x vips, bodytrack, canneal, facesim, swaptions, streamcluster, x264
Conclusions ESXi 5.1 • Performance isolation in virtual environments limits the benefits of application-selection based consolidation strategies • We propose a runtime resource management technique based that takes the performance scalability into account • Our proposed technique improves the energy efficiency by 17% over the state-of-the-art techniques