1 / 22

Dynamic Power Redistribution in Failure-Prone CMPs

Dynamic Power Redistribution in Failure-Prone CMPs. Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc. Motivation. Hardware failures expected to become prominent in future generations. Back End (BE). Front End (FE). Load-Store Queue (LSQ). Core.

dareh
Download Presentation

Dynamic Power Redistribution in Failure-Prone CMPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter* and David H. Albonesi Cornell University *Google, Inc.

  2. Motivation • Hardware failures expected to become prominent in future generations Back End (BE) Front End (FE) Load-Store Queue (LSQ) Core WEED2010

  3. Motivation • Deconfiguration tolerates defects at the expense of performance • Pipeline imbalance • Units correlated with deconfigured one might become overprovisioned • Power inefficiencies • Application specific Back End (BE) Front End (FE) ? ? Load-Store Queue (LSQ) Core WEED2010

  4. Research Goal • Given a CMP with a set of failures and a power budget: • Eliminate power inefficiencies • Improve performance WEED2010

  5. Outline • Motivation • Architecture • Power Harnessing • Performance Boosting • Power Transfer Runtime Manager • Conclusions and future work WEED2010

  6. Two-step approach Transfer power Harness Power Architecture Back End (BE) Back End (BE) Front End (FE) Front End (FE) Load-Store Queue (LSQ) Load-Store Queue (LSQ) Core 1 Core 2 WEED2010

  7. Power Harnessing BE RF BPred Decode/ Rename IQ FQ Dispatch I-Cache Select ROB FE LSQ D-Cache WEED2010

  8. Pipeline Imbalance Performance Loss Power Saved WEED2010

  9. Performance Boosting • Distribute accumulated margin of power to boost performance • Temporarily enable a previously dormant feature • Requirements • Small area and fast power-up • Small PPR (Power-Performance Ratio) WEED2010

  10. Performance Boosting Techniques • Speculative Cache Access • Speculatively send L1 requests to the L2 cache • Speculatively access both tag and data in the L2 cache at the same time (rather than serially) • Turned on independently or in combination • Approximately linear power-performance relationship • Benefits applications limited by L1 cache capacity miss Lower Hierarchy Level miss Lower Hierarchy Level L1 Cache L2 Cache Tag Load hit L1 Miss Tag Data Data L2 Cache L2 Cache WEED2010

  11. Performance Boosting Techniques • Boosting main memory performance • CLEAR [N. Kirman et al, HPCA 2005] • Predict and speculatively retire long latency loads • Supply predicted values to destination registers • Free processor resources for non-dependent instructions • Linear power-performance relationship • Benefits memory bound applications WEED2010

  12. Performance Boosting Techniques • DVFS • Scale up voltage and frequency • Already built in • Cubic power cost for linear performance benefit • Benefits high-IPC applications WEED2010

  13. Comparison of Boosting Techniques Performance Improvement WEED2010

  14. Two-step approach Transfer power Harness Power Architecture Back End (BE) Back End (BE) Front End (FE) Front End (FE) Load-Store Queue (LSQ) Load-Store Queue (LSQ) Core 1 Core 2 WEED2010

  15. Power Transfer Runtime Manager • Periodically coordinate chip-wide effort to relocate power among cores • Obtain current local hardware deconfiguration status (due to faults) • Determine additional components to be deconfigured • Transfer power to one or more mechanisms that make best use of it WEED2010

  16. Power Transfer Runtime Manager Sampling Phase Sample deconfigurations Choose additional deconfiguration Steady Phase Local decisions Sample performance boosting Compute global throughput with fairness Choose best 4-core configuration Global Decisions Apply DVFS (greedy) WEED2010

  17. Global vs Local Optimization • 100 4-core configurations, random errors and random SPEC CPU2000 benchmarks 22.2% Speedup 10.0% WEED2010

  18. Diversity of Boosting Techniques • 100 4-core configurations, random errors and random SPEC CPU2000 benchmarks 22.2% Speedup 6.3% WEED2010

  19. Power Transfer Runtime Manager • 100 4-core configurations, random errors and random SPEC CPU2000 benchmarks 22.2% Speedup 15.3% 10.0% 6.3% WEED2010

  20. Conclusions • We proposed a technique to increase performance given a certain power budget in the presence of hard faults • Exploited the deconfiguration capabilities already built in microprocessors • Demonstrated that pipeline imbalances and additional deconfiguration are application-dependent • Proposed several boosting techniques • Demonstrated the potential for substantial performance gains for a 4-core CMP WEED2010

  21. Future Work • Heuristic approaches to scale this problem to many cores • Simulated Annealing, Genetic Algorithm • Pareto optimal fronts to reduce the number of combinations • Hierarchical optimization WEED2010

  22. Questions?

More Related