1 / 13

Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound

Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound. HPPAC 2012. Barry Rountree, Dong H. Ahn , Bronis R. de Supinski , David K. Lowenthal , Martin Schulz. Monday, May 21st. Computing under a power bound forces us to rethink performance. Traditional

oakes
Download Presentation

Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond DVFS:A First Look at Performance Under a Hardware-Enforced Power Bound HPPAC 2012 • Barry Rountree, Dong H. Ahn, Bronis R. de Supinski, • David K. Lowenthal, Martin Schulz • Monday, May 21st

  2. Computing under a power bound forces us to rethink performance • Traditional • All components can operate at highest power level simultaneously • Power provisioned for “worst case” • Users are happily oblivious (about power) • Few if any applications limited by power • Exascale (if not sooner) • Not all components can operate at highest power level simultaneously • Power provisioning is best effort • Users must tune power for performance • Nearly every application limited by power

  3. Computing under a power bound forces us to rethink performance • Traditional • Utilization measured in node-hours • Weak-scaling jobs perform best using as many nodes as possible • Running all components as fast as possible reliably leads to top performance • Exascale (if not sooner) • Utilization measured in kilowatt hours • Weak-scaling jobs may perform optimally with fewer, faster nodes • Running all components as fast as possible cannot be done. Running most components at identical speeds is suboptimal

  4. An Unexpected Power Bound:Merlot cluster at LLNL Average Processor Power Bound exascale (?) rzmerl (Early April) rzmerl (Mid April) Average Processor Power Bound Sum of processor power draw divided by processor count must be at or below this level. Power (Watts) Lost performance Each processor uses some amount of power Total processor power divided by processor count should be less than the bound Long-term solution: Schedule power to optimize performance Short-term solution: Disable Turbo Boost globally Mid-term solution: Buy more power (This does not scale) Linpack + Intel Turbo Boost GHz non-turbo (2.6 GHz) max turbo (3.3 GHz) Processors

  5. Scheduling Power with Processor Hardware: Intel’s RAPL • Runtime Average Power Limit (RAPL) • Measures cumulative joules (power x time) • Three separate power meters • Clamping on package and DRAM power • Turbo suppression • Effective frequency • libmsr currently under development

  6. Domains and Features of Runing Average Power Limit Technology Introduced on Sandy Bridge Processors Onboard energy meters measure accumulated joules. Divide by time to get average power. Can place user-specified limit on average power over a user-specific time window. Source: Intel 64 and IA-32 Software Developer’s Manual, Volume 3B

  7. Bounding Package Power with RAPL Setting LOCK fixes power limits until reboot Two windows allows tweaking peak and average power Higher bound, smaller window for peak power Lower bound, wider window for average power Limits are ignored until enable bits are set Power limit is enforced using average watts over user specified window. Resolution: ~1ms Max Window: ~46ms Watts granularity: 0.125W Minimum power bound: 51W Source: Intel 64 and IA-32 Software Developer’s Manual, Volume 3B

  8. Bounding DRAM Power with RAPL Similar interface for DRAM power control Only one power limit supported Source: Intel 64 and IA-32 Software Developer’s Manual, Volume 3B

  9. Processors are Heterogeneous Under a Power Bound rzzin mg.C.8 64 processors 34 power bounds No Power Bound Processors take similar time Significant variation in power Power variation expected and acceptable 51W Power Bound Processors require same amount of power Individual processor efficiency has not changed Efficiency variation manifests as performance variation Processors are heterogeneous under a power bound Where should the hot processors go? Is is worth paying a premium efficient processors?

  10. Wide Variation in Application Package Power Draw rzmerl NPB C.8 234 processors Wide variation in power consumption across applications Provisioning power for most power-hungry application leaves remaining applications node-bound, not power-bound Avergae Watts Processors ordered by cg.C.8 average PKG power

  11. Wide Variation in Application DRAM Power Draw rzmerl NPB C.8 234 processors Memory power substantially lower than package power Avergae Watts Processors ordered by cg.C.8 average PKG power

  12. Exascale Is Not Only Bigger: Exascale Is Fundamentally Different • Overprovision hardware • Processors are cheap and plentiful • Power is not • Measure performance at max power consumption • May require turning off nodes • Running out of nodes before running out of power means application is not power-bound • Expect heterogeneous processor performance • Put most-efficient nodes on the critical path if possible • Put least-efficient nodes where they will do the least harm

More Related