Balancing Performance and Power Consumption in Data-Intensive Computing Clusters

Balancing Performance and Power Consumption in Data-Intensive Computing Clusters Project Proposal by: Shan Li Josh Sorchik Juzi Zhao

Problem Statement: • How can we balance performance and power consumption in a data-intensive computing cluster by optimizing the available nodes based upon the type of job being executed?

Background: • Data-intensive computing is limited by power • Low-power hardware exists • APIs allow us to exploit low power features of modern processors • Research has been conducted that focuses on dynamically adapting the number of available nodes using “bidding”, machine-learning, and other methods

Methodology: • Place unneeded nodes in low-power state • Use benchmarks to determine: • What types of operations are drawing the most power? (CPU, I/O, network) • How does power consumption vary when adding/removing nodes? • How does execution time vary when adding/removing nodes? • How does the power/execution time vary when changing the input data size? • Based upon test results, create scheduling policies to optimize energy consumption for different job types and apply them to applicable job types • Determine the trade-offs between power usage and acceptable performance

Benchmarks • Vary the number of active nodes and data set sizes while running the following benchmarks: • RandomWriter • Grep • Sort • Analyze workload characteristics, e.g., data size, write/read ratio, execution time etc., and the relationship between workload characteristics and power consumption.

Model • N = Total nodes • Na = Active nodes • Ni = Inactive nodes • E(N) = Total power consumption • Ea = Power consumption of one active node • Ei = Power consumption of one inactive node • D(N) = Execution time as a function of N • T = Time between jobs • E(N) = Ea*D(N)+Ei(T-D(N))

Evaluating Results: • Apply derived scheduling policies to new jobs (CPU, I/O, etc.) to determine if the policies have an effect on the performance/power ratio

Cluster specifications • Intel Atom 330 dual-core processors • Zotac Motherboard • Intel X-25M Solid State hard drives • Dell PowerConnect 2848 switch

Milestones • Configure a cluster to utilize low-power hardware (CPU, motherboard, solid-state drive) and the Hadoop framework • Execute Hadoop benchmarks to analyze the impact of number of active compute nodes, Hadoop workload characteristics, and performance by collecting execution time and power consumption • Develop a scheduling policy based upon the experiment outcome depending upon the job type being executed • Evaluate the scheduling policy on the various Hadoop benchmarks

Balancing Performance and Power Consumption in Data-Intensive Computing Clusters

Balancing Performance and Power Consumption in Data-Intensive Computing Clusters

Presentation Transcript

Data-Intensive Distributed Computing

Data-Intensive Computing

Data-Intensive Distributed Computing

Petascale Data Intensive Computing

High Performance Computing, Clusters, and Productivity

Wei Jiang Data-Intensive and High Performance Computing Research Group

Data Intensive Computing

Energy Consumption as a First-Class “Performance” Goal for Data Intensive Computing

Data -Intensive Computing Systems

Balancing Power Consumption in Multiprocessor Systems

High Performance Computing, Clusters, and Productivity