1 / 35

Challenges in Distributed Energy Adaptive Computing

Challenges in Distributed Energy Adaptive Computing. K. Kant NSF and GMU. Information & communication Technology (ICT) has a problem Performance Centric  Energy & Sustainability centric How do we get there?. ICT Power Growth until 2020. Increase in spite of power efficient designs

Download Presentation

Challenges in Distributed Energy Adaptive Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  2. Information & communication Technology (ICT) has a problem Performance Centric  Energy & Sustainability centric How do we get there? K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  3. ICT Power Growth until 2020 • Increase in spite of power efficient designs • Clients: 8x in number, 3X in power • Data Centers: > 2X increase • Network: 3X increase Network Clients Transmission, conversion & distribution Data Center K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  4. Current StateUnsustainable Computing K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  5. Data Center Infrastructure • Resource intensive: Water, cabling, metal, … • ~50% power wasted before getting to racks K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  6. Distribution Infrastructure ~10% distribution loss + High carbon impact IT LOAD 2.5MW Generator ~180 Gallons/hour 13.2kv 208V ~1% loss in switch gear and conductors 115kv UPS: 480V 13.2kv 13.2kv 6% loss 94% efficient 1.0% loss 99.0% efficient 0.3% loss 99.7% efficient 0.5% loss 99.5% efficient K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  7. ~50% Rack Power Wasted K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  8. Sustainable Computing K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  9. Renewable Energy Push • Limit energy draw from grid • Less infrastructure • Less losses • but variable supply Need better power adaptability K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  10. High Temperature DC’s • Chiller-less operation • Less energy/materials, but space inefficient • High temperature operation • Smaller Toutlet – Tinlet • More throttling • More failure prone (?) X Need smarter thermal adaptability K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  11. Overdesign • Overdesign is the norm today • Huge power supplies, fans, heat sinks, server cases, high rack capacity, UPS capacity, … • Engineered for worst case  Rarely encountered • Huge power wastage, waste of materials, energy, … • What if we right-size everything? • Highly energy efficient but need smarter control Better energy adaptability to deal w/ frugal design

  12. Energy Adaptive Computing • EAC strives to do dynamic end to end adjustment to • Workload adaptation for graceful QoS degradation under energy limitations • Infrastructure adaptation to cope with temporary energy deficiencies. • Requires coordinated power/thermal mgmt of computation, network & storage. • Enhances sustainability of IT infrastructure

  13. EAC Instances K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  14. Client-server EAC • Transparently adapt to client energy states • State = {on-AC, normal, low-battery, …} • Service contract Ci = {setup QoS, operational QoS} • Adaptation Challenges • Communicating & enforcing contracts. • Group adaptation of clients forced by network/servers ? K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  15. Cluster EAC • Adaptation to intra & inter-DC limits • Multi-level: Server, rack & DC levels • Adaptation Challenges • Estimate & collect power deficits/surplus at multiple levels • Coordination across large range of devices • Location based services • Coordination across levels • Simultaneously handle client-server loop K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  16. P2P EAC • Adaptation based on “available energy” • Content: video resolution, audio coding, … • Network: modulate wireless radio usage (?) • Energy proportional use of peer resources • Energy driven content replication & reorganization • Adaptation Challenges • Satisfying QoS ? • Balancing src/dest usage vs. relay node energy usage ? K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  17. ChallengesSome specific Issues K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  18. Power Estimation Challenges • Notion of effective power? • Additive relationship: Workload  power • Why is this hard? Interference • Available power • Determined by power, thermal & perhaps other issues (noise). • Required at multiple levels: facility, enclosure, machine, … K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  19. Network Role in EAC • Energy Adaptation • Aggressive control of switch/router ports • Speed, state & width controls • Traffic consolidation across paths • Adaptation induced congestion • Propagation (e.g., ECN, EBCN) & response • Computation – communication tradeoff ? • Redirection ? • Network protocol support for adaptation? K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  20. Other Issues • EAC Security • Attacks on power sources • Energy Attacks on IT, e.g., • Demanding too much, cyclic demands, … • Storage adaptation • Storage devices, controllers & network. • Coordinated end to end control is hard! • Formal models to understand impact of energy adaptation. K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  21. Energy Adaptation in Data Centers K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  22. Adaptation Methods • Workload Adaptation • Coarse grain: Shut down low priority tasks • Fine grain: Graceful QoS degradation, e.g., • Batched service, poorer resolution, … • Infrastructure Adaptation • Operation at lower speeds (DVFS) • Effective use of low power modes & “width” control. • Workload adaptation always done first

  23. Infrastructure Adaptation • Need a multilevel scheme – • Individual “assets” up to entire data center • Need both supply & demand side adaptations

  24. Supply Side Adaptation • Supply side Limits • Hard caps at higher levels (true limit) vs. “soft” (artificial) caps at lower levels. • Limits may be a result of thermal/cooling issues. • Load consolidation • An essential part of energy efficient operation • Load consolidation vs. soft capping • Need to address workload adaptation changes as a result of supply increase & decrease.

  25. Demand Side Adaptation • Adaptation to fluctuating demand • Transactional workload: Migrate queries or app VMs? • Issues w/ combined supply & demand side adaptations • Imbalance: One node squeezed while other has surplus power • Ping-pong Control: Oscillatory migration of workload • Error accumulation down the hierarchy.

  26. A Proposed Algorithm • Unidirectional control • Load migration moves up the hierarchy, from local to global. • Local migrations are temporary & do not trigger changes to “soft” caps on supply. • Target Node selection • Based on bin packing (best-fit decreasing) • Allows for more imbalance, which can be exploited for workload consolidation • Properties • Avoids ping-pong, attempts to minimize imbalance

  27. Experimental Results • Scenario • 3 levels, 18 identical servers (4+4 + 5+5) • 3 applications, total of 25 app instances • Any app can run on any server • Demand Poisson (active power ∞ utilization)

  28. Migration Frequency • Migration drivers: consolidation vs. energy deficiency • Low util Consolidation, High util Energy deficiency • Other characteristics • Migration frequency low in all cases • No ping-pong observed

  29. Thermal Impacts • Additional Issues • Energy consumption limited by thermal/cooling issues, not energy availability • Migrations required to limit temperature • Temperature & power have nonlinear relationship • Need to account for both power & thermal effects

  30. Results w/ Thermal Effects • Imbalanced cooling • Servers 1-14: Ta=25o C, Servers 15-18: Ta=40oC • Temperature limit: 65oC • Power demand is adjusted by the alg. to account for higher temperature

  31. Conclusions • Need to go beyond energy efficiency • Design devices/systems to minimize life-cycle energy footprint • Creatively adapt to available energy to operate “at the edge” • Ongoing/future work • Coordinated server, network & storage mgmt. • Explore tradeoffs between QoS, power savings and admission control performance

  32. Thank you! K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  33. Power Inefficiencies Wasted leakage & clock power Rack supply 90-95% efficient CPU Voltage Regulators 280V Server PSU DRAM & Mem controller ±12, ±5V 70-90% efficient Fans Storage Adapters 95% efficient Idle wasted power K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  34. Operating Regimes K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

  35. DC1 storage Server1 DC2 storage Server2 So, What’s the Problem Client Client • Local constraints & controls  end-to-end impacts • DC to DC load shift • Service disruption & post-shift impact • Client request to alter content • Less or more work for server • Potential conflicting controls Network Core Network K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

More Related