340 likes | 428 Views
Achieving Power-Efficiency in Clusters without Distributed File System Complexity. Hrishikesh Amur, Karsten Schwan Georgia Tech. http://img.all2all.net/main.php?g2_itemId=157. Green Computing Research Initiative at GT.
E N D
Achieving Power-Efficiency in Clusters without Distributed File System Complexity Hrishikesh Amur, KarstenSchwan Georgia Tech
http://img.all2all.net/main.php?g2_itemId=157 Green Computing Research Initiative at GT Datacenter and beyond: design, IT management, HVAC control… (ME, SCS, OIT…) Rack: mechanical design, thermal and airflow analysis, VPTokens, OS and management (ME, SCS) focus of our work: Power distribution and delivery (ECE) Board: VirtualPower, scheduling/scaling/operating system… (SCS, ME, ECE) Chip and Package: power multiplexing, spatiotemporal migration (SCS, ECE) Circuit level: DVFS, power states, clock gating (ECE)
Focus Data-intensive applications that use distributed storage
Approach to Power-Efficiency of Cluster • Power off entire nodes
Modifications to Data Layout Policy • One replica of all data placed on a small set of nodes • Primary replica maintains availability, allowing nodes storing other replicas to be turned off [Sierra, Rabbit]
Handling New Data Where is new data to be written when part of the cluster is turned off?
New Data: Temporary Offloading • Temporary off-loading to ‘on’ nodes is a solution • Cost of additional copying of lots of data • Usage of network bandwidth • Increased complexity!!
Handling Primary Failures • Failure of primary nodes cause a large number of nodes to be started up to restore availability • To solve this, additional groups with secondary, tertiary etc. copies have to be made. • Again, increased complexity!!
Making a DFS power-proportional increases its complexity significantly
Our Solution Provide fine-grained control over what components to turn off
How do we save power? Switch between two extreme power modes: max_perf and io_server
How does this keep the DFS simple? Fine-grained control allows all disks to be kept on maintaining access to stored data
Prototype Node Architecture Obelix Node Asterix Node SATA Switch
Prototype Node Architecture VMM Obelix Node Asterix Node SATA Switch
max_perf Mode VM Obelix Node Asterix Node SATA Switch
io_server Mode Obelix Node VM Asterix Node SATA Switch
Summary • Turning entire nodes off complicates DFS • Good to be able to turn components off, or achieve more power-proportional platforms/components • Prototype uses separate machines and shared disks
Load Management Policies • Static • e.g., DFS, DMS, monitoring/management tasks… • Dynamic • e.g., based on runtime monitoring and management/scheduling… • helpful to do power metering on per process/VM basis • X86+Atom+IB…
VM-level Power Metering: Our Approach • Built power profiles for various platform resources • CPU, memory, cache, I/O… • Utilize low-level hardware counters to track resource utilization on per VM basis • xenoprofile, IPMI, Xen tools… • track sets of VMs separately • Maintain low/acceptable overheads while maintaining desired accuracy • limit amount of necessary information, number of monitored events: use instructions retired/s and LLC misses/s only • establish accuracy bounds • Apply monitored information to power model to determine VM power utilization at runtime • in contrast to static purely profile-based approaches