Adaptive Energy -efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems

Adaptive Energy-efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems Can Hankendi Ayse K. Coskun Boston University Electrical and Computer Engineering Department This project has been partially funded by:

Energy Efficiency in Computing Clusters • Energy-related costs are among the biggest contributors to the total cost of ownership. • Consolidating multiple workloads on the same physical node improves energy efficiency. (Source: International Data Corporation (IDC), 2009)

Multi-threaded Applications in the Cloud • HPC applications are expected to shift towards cloud resources. • Resource allocation decisions significantly affect the energy efficiency of server nodes. • Energy efficiency is a function of application characteristics. Contribution: An adaptive resource allocation policy for multi-threaded applications on consolidated server nodes

Outline • Background • Methodology • Adaptive Resource Sharing • Results • Conclusions

Background Cluster-level VM Management Node-level Management • Consolidation policies across server nodes • VM migration techniques • [Srikantaiah, HotPower’08] • [Bonvin, CCGrid’11] • Co-scheduling based on thread communication • Identifying best thread mixes to co-schedule • [Frachtenberg, TPDS’05] • [McGregor, IPDPS’05] • Recent Co-scheduling policies • Co-scheduling contrasting workloads • Balancing performance events across nodes • Cache misses • IPC • Bus accesses [Dhiman, ISLPED’09] [Bhadauria, ICS’10]

Virtualized System Setup • 12-core AMD Magny Cours Server • 2x 6-core dies attached side by side in the same package • Private L1 and L2-caches for each core • 6 MB shared L3-cache for each 6-core die • Virtualized through VMware vSphere 5 ESXi hypervisor • 2 Virtual Machines (VM) with Ubuntu Server Guest OS

Methodology: Measurement Setup • System-level power measurements at 1s sampling rate • Performance counter collection through vmkperf at 1s sampling rate • Counters: CPU cycles, retired instructions, L3-cache misses • VM-level CPU and memory utilization data collection through esxtop with 2s sampling rate esxtop vmkperf Logger System-level power measurement

Parallel Workloads • PARSEC 2.1 benchmark suite [Bienia et al., 2008]

Tracking Parallel Phases • consolmgmt • Consolidation management interface • Synchronizes ROI (region-of-interest) of multiple workloads consolmgmt parsecmgmt Benchmark A hooks.c Input (Serial) Output (Serial) sleep() Parallel Phase roi-Trigger() start-Logging Benchmark B roi-Trigger() Input (Serial) Output (Serial) Parallel Phase start-Logging() end-Logging()

Performance Impact of Consolidation • Consolidating multiple workloads can degrade performance due to resource contention. • Virtualization provides performance isolation by managing memory and NUMA node affinities. • With native OS, performance variation is 2.5x higher. Average throughput of Streamcluster when co-scheduled with another PARSEC benchmark

Outline • Background • Methodology • Adaptive Resource Sharing • Results • Conclusions

Impact of Application Selection • Previous co-scheduling policies focus on application selection to improve energy efficiency. • Application selection is based on balancing memory operations and CPU usage. D B A C

Predicting Power Efficiency • To improve the energy efficiency, we need to allocate more CPU resources to power-efficient workloads. • IPC*CPU Utilization metric shows strong correlation with power efficiency. IPC*CPU Utilization

Application Classification • IPC*CPU Utilization metric is used to classify applications according to their power efficiency levels. • We utilize density based clustering algorithm (DBSCAN) to determine application groups based on their power efficiency classes.

Application Classification • IPC*CPU Utilization metric is used to classify applications according to their power efficiency levels. • We utilize density based clustering algorithm (DBSCAN) to determine application groups based on their power efficiency classes. Benchmarks VM Configuration VM0 VM1 Case 1 ESXi VM0 VM0 VM1 VM1 Case 2 ESXi

Reconfiguring Resource Allocations • CPU hot-plugging: • Adding/removing vCPUs during runtime • Cons: Removing vCPU is not supported in some OSes • Resource Allocation Adjustment: • Allocating/limiting CPU resources for VMs • Pros: Fine granularity (resource allocation unit is MHz) • Both techniques have low overhead, less than 1%. Resource Configuration Comparison

Reconfiguration Runtime Behavior • Resource allocation limits can be dynamically adjusted according to application classes. • CPU allocation limits can be effectively reconfigured within a second.

Results • Proposed approach improves throughput-per-watt by up to 25% and by 9% on average.

Results • We generate 50 workload sets, each consists of randomly selected 10 PARSEC applications. Set 2 3x canneal 3x ferret 2x bodytrack 1x dedup 1x vips 4x blackscholes 2x vips 1x bodytrack 1x freqmine 1x streamcluster 1x swaptions Set 1

Results • We generate 50 workload sets, each consists of randomly selected 10 PARSEC applications. • Proposed resource sharing technique improves the throughput-per-watt by 12% on average in comparison to application selection based co-scheduling techniques.

Conclusions & Future Work • Consolidation is a powerful technique to improve the energy efficiency on data centers. • Energy efficiency of parallel workloads varies significantly depending on application characteristics. • Adaptive VM configuration for parallel workloads improves the energy efficiency by 12% on average over existing co-scheduling algorithms. • Future research directions include: • Investigating the effect of memory allocation decisions on energy efficiency; • Utilizing application-level instrumentation to explore power/energy optimization opportunities; • Expanding the application space.

Adaptive Energy -efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems

Adaptive Energy -efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems

Presentation Transcript

Multi-threaded RTOS

Transparent Adaptive Resource Management for Middleware Systems

Virtualized Storage for Virtualized Workloads: The Smarter Data Center

vGreen : A System for Energy Efficient Manager in Virtualized Environments

vGreen : a system for energy efficient computing in virtualized environments

Characterizing Multi-threaded Applications based on Shared-Resource Contention

Multi-threaded applications

Regression Verification for Multi-Threaded Programs

Multi-threaded Reachability

Variability in Architectural Simulations of Multi-threaded Workloads

Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud

Multi-threaded Reachability

Sharing Knowledge in Adaptive Learning Systems

Migrating Protocols In Multi-Threaded Message-Passing Systems

Multi-Threaded Transactions

VGREEN: A SYSTEM FOR ENERGY EFFICIENT COMPUTING IN VIRTUALIZED ENVIRONMENTS

Parallelism (Multi-threaded)

Multi-threaded RTOS

Multi-threaded ROOT

Variability in Architectural Simulations of Multi-threaded Workloads

Variability in Architectural Simulations of Multi-threaded Workloads

Adaptive Resource Allocation in Multiuser OFDM Systems