1 / 34

Achieving Power-Efficiency in Clusters without Distributed File System Complexity

Achieving Power-Efficiency in Clusters without Distributed File System Complexity. Hrishikesh Amur, Karsten Schwan Georgia Tech. http://img.all2all.net/main.php?g2_itemId=157. Green Computing Research Initiative at GT.

sef
Download Presentation

Achieving Power-Efficiency in Clusters without Distributed File System Complexity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Achieving Power-Efficiency in Clusters without Distributed File System Complexity Hrishikesh Amur, KarstenSchwan Georgia Tech

  2. http://img.all2all.net/main.php?g2_itemId=157 Green Computing Research Initiative at GT Datacenter and beyond: design, IT management, HVAC control… (ME, SCS, OIT…) Rack: mechanical design, thermal and airflow analysis, VPTokens, OS and management (ME, SCS) focus of our work: Power distribution and delivery (ECE) Board: VirtualPower, scheduling/scaling/operating system… (SCS, ME, ECE) Chip and Package: power multiplexing, spatiotemporal migration (SCS, ECE) Circuit level: DVFS, power states, clock gating (ECE)

  3. Focus Data-intensive applications that use distributed storage

  4. Per-system Power Breakdown

  5. Approach to Power-Efficiency of Cluster • Power off entire nodes

  6. Turning Off Nodes Breaks Conventional DFS

  7. Turning Off Nodes Breaks Conventional DFS

  8. Turning Off Nodes Breaks Conventional DFS

  9. Turning Off Nodes Breaks Conventional DFS

  10. Turning Off Nodes Breaks Conventional DFS

  11. Turning Off Nodes Breaks Conventional DFS

  12. Turning Off Nodes Breaks Conventional DFS

  13. Modifications to Data Layout Policy • One replica of all data placed on a small set of nodes • Primary replica maintains availability, allowing nodes storing other replicas to be turned off [Sierra, Rabbit]

  14. Handling New Data Where is new data to be written when part of the cluster is turned off?

  15. New Data: Temporary Offloading

  16. New Data: Temporary Offloading • Temporary off-loading to ‘on’ nodes is a solution • Cost of additional copying of lots of data • Usage of network bandwidth • Increased complexity!!

  17. Handling Primary Failures • Failure of primary nodes cause a large number of nodes to be started up to restore availability • To solve this, additional groups with secondary, tertiary etc. copies have to be made. • Again, increased complexity!!

  18. Making a DFS power-proportional increases its complexity significantly

  19. Our Solution Provide fine-grained control over what components to turn off

  20. How do we save power? Switch between two extreme power modes: max_perf and io_server

  21. How does this keep the DFS simple? Fine-grained control allows all disks to be kept on maintaining access to stored data

  22. Prototype Node Architecture Obelix Node Asterix Node SATA Switch

  23. Prototype Node Architecture VMM Obelix Node Asterix Node SATA Switch

  24. max_perf Mode VM Obelix Node Asterix Node SATA Switch

  25. io_server Mode Obelix Node VM Asterix Node SATA Switch

  26. Increased Performance/Power

  27. Increased Performance/Power

  28. Increased Performance/Power

  29. Increased Performance/Power

  30. Virtualization Overhead: Reads

  31. Virtualization Overhead: Writes

  32. Summary • Turning entire nodes off complicates DFS • Good to be able to turn components off, or achieve more power-proportional platforms/components • Prototype uses separate machines and shared disks

  33. Load Management Policies • Static • e.g., DFS, DMS, monitoring/management tasks… • Dynamic • e.g., based on runtime monitoring and management/scheduling… • helpful to do power metering on per process/VM basis • X86+Atom+IB…

  34. VM-level Power Metering: Our Approach • Built power profiles for various platform resources • CPU, memory, cache, I/O… • Utilize low-level hardware counters to track resource utilization on per VM basis • xenoprofile, IPMI, Xen tools… • track sets of VMs separately • Maintain low/acceptable overheads while maintaining desired accuracy • limit amount of necessary information, number of monitored events: use instructions retired/s and LLC misses/s only • establish accuracy bounds • Apply monitored information to power model to determine VM power utilization at runtime • in contrast to static purely profile-based approaches

More Related