210 likes | 353 Views
Adaptive Energy -efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems. Can Hankendi Ayse K. Coskun Boston University Electrical and Computer Engineering Department. This project has been partially funded by:. Energy Efficiency in C o mputing Clusters.
E N D
Adaptive Energy-efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems Can Hankendi Ayse K. Coskun Boston University Electrical and Computer Engineering Department This project has been partially funded by:
Energy Efficiency in Computing Clusters • Energy-related costs are among the biggest contributors to the total cost of ownership. • Consolidating multiple workloads on the same physical node improves energy efficiency. (Source: International Data Corporation (IDC), 2009)
Multi-threaded Applications in the Cloud • HPC applications are expected to shift towards cloud resources. • Resource allocation decisions significantly affect the energy efficiency of server nodes. • Energy efficiency is a function of application characteristics. Contribution: An adaptive resource allocation policy for multi-threaded applications on consolidated server nodes
Outline • Background • Methodology • Adaptive Resource Sharing • Results • Conclusions
Background Cluster-level VM Management Node-level Management • Consolidation policies across server nodes • VM migration techniques • [Srikantaiah, HotPower’08] • [Bonvin, CCGrid’11] • Co-scheduling based on thread communication • Identifying best thread mixes to co-schedule • [Frachtenberg, TPDS’05] • [McGregor, IPDPS’05] • Recent Co-scheduling policies • Co-scheduling contrasting workloads • Balancing performance events across nodes • Cache misses • IPC • Bus accesses [Dhiman, ISLPED’09] [Bhadauria, ICS’10]
Virtualized System Setup • 12-core AMD Magny Cours Server • 2x 6-core dies attached side by side in the same package • Private L1 and L2-caches for each core • 6 MB shared L3-cache for each 6-core die • Virtualized through VMware vSphere 5 ESXi hypervisor • 2 Virtual Machines (VM) with Ubuntu Server Guest OS
Methodology: Measurement Setup • System-level power measurements at 1s sampling rate • Performance counter collection through vmkperf at 1s sampling rate • Counters: CPU cycles, retired instructions, L3-cache misses • VM-level CPU and memory utilization data collection through esxtop with 2s sampling rate esxtop vmkperf Logger System-level power measurement
Parallel Workloads • PARSEC 2.1 benchmark suite [Bienia et al., 2008]
Tracking Parallel Phases • consolmgmt • Consolidation management interface • Synchronizes ROI (region-of-interest) of multiple workloads consolmgmt parsecmgmt Benchmark A hooks.c Input (Serial) Output (Serial) sleep() Parallel Phase roi-Trigger() start-Logging Benchmark B roi-Trigger() Input (Serial) Output (Serial) Parallel Phase start-Logging() end-Logging()
Performance Impact of Consolidation • Consolidating multiple workloads can degrade performance due to resource contention. • Virtualization provides performance isolation by managing memory and NUMA node affinities. • With native OS, performance variation is 2.5x higher. Average throughput of Streamcluster when co-scheduled with another PARSEC benchmark
Outline • Background • Methodology • Adaptive Resource Sharing • Results • Conclusions
Impact of Application Selection • Previous co-scheduling policies focus on application selection to improve energy efficiency. • Application selection is based on balancing memory operations and CPU usage. D B A C
Predicting Power Efficiency • To improve the energy efficiency, we need to allocate more CPU resources to power-efficient workloads. • IPC*CPU Utilization metric shows strong correlation with power efficiency. IPC*CPU Utilization
Application Classification • IPC*CPU Utilization metric is used to classify applications according to their power efficiency levels. • We utilize density based clustering algorithm (DBSCAN) to determine application groups based on their power efficiency classes.
Application Classification • IPC*CPU Utilization metric is used to classify applications according to their power efficiency levels. • We utilize density based clustering algorithm (DBSCAN) to determine application groups based on their power efficiency classes. Benchmarks VM Configuration VM0 VM1 Case 1 ESXi VM0 VM0 VM1 VM1 Case 2 ESXi
Reconfiguring Resource Allocations • CPU hot-plugging: • Adding/removing vCPUs during runtime • Cons: Removing vCPU is not supported in some OSes • Resource Allocation Adjustment: • Allocating/limiting CPU resources for VMs • Pros: Fine granularity (resource allocation unit is MHz) • Both techniques have low overhead, less than 1%. Resource Configuration Comparison
Reconfiguration Runtime Behavior • Resource allocation limits can be dynamically adjusted according to application classes. • CPU allocation limits can be effectively reconfigured within a second.
Results • Proposed approach improves throughput-per-watt by up to 25% and by 9% on average.
Results • We generate 50 workload sets, each consists of randomly selected 10 PARSEC applications. Set 2 3x canneal 3x ferret 2x bodytrack 1x dedup 1x vips 4x blackscholes 2x vips 1x bodytrack 1x freqmine 1x streamcluster 1x swaptions Set 1
Results • We generate 50 workload sets, each consists of randomly selected 10 PARSEC applications. • Proposed resource sharing technique improves the throughput-per-watt by 12% on average in comparison to application selection based co-scheduling techniques.
Conclusions & Future Work • Consolidation is a powerful technique to improve the energy efficiency on data centers. • Energy efficiency of parallel workloads varies significantly depending on application characteristics. • Adaptive VM configuration for parallel workloads improves the energy efficiency by 12% on average over existing co-scheduling algorithms. • Future research directions include: • Investigating the effect of memory allocation decisions on energy efficiency; • Utilizing application-level instrumentation to explore power/energy optimization opportunities; • Expanding the application space.