220 likes | 321 Views
Dynamic Power Redistribution in Failure-Prone CMPs. Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc. Motivation. Hardware failures expected to become prominent in future generations. Back End (BE). Front End (FE). Load-Store Queue (LSQ). Core.
E N D
Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter* and David H. Albonesi Cornell University *Google, Inc.
Motivation • Hardware failures expected to become prominent in future generations Back End (BE) Front End (FE) Load-Store Queue (LSQ) Core WEED2010
Motivation • Deconfiguration tolerates defects at the expense of performance • Pipeline imbalance • Units correlated with deconfigured one might become overprovisioned • Power inefficiencies • Application specific Back End (BE) Front End (FE) ? ? Load-Store Queue (LSQ) Core WEED2010
Research Goal • Given a CMP with a set of failures and a power budget: • Eliminate power inefficiencies • Improve performance WEED2010
Outline • Motivation • Architecture • Power Harnessing • Performance Boosting • Power Transfer Runtime Manager • Conclusions and future work WEED2010
Two-step approach Transfer power Harness Power Architecture Back End (BE) Back End (BE) Front End (FE) Front End (FE) Load-Store Queue (LSQ) Load-Store Queue (LSQ) Core 1 Core 2 WEED2010
Power Harnessing BE RF BPred Decode/ Rename IQ FQ Dispatch I-Cache Select ROB FE LSQ D-Cache WEED2010
Pipeline Imbalance Performance Loss Power Saved WEED2010
Performance Boosting • Distribute accumulated margin of power to boost performance • Temporarily enable a previously dormant feature • Requirements • Small area and fast power-up • Small PPR (Power-Performance Ratio) WEED2010
Performance Boosting Techniques • Speculative Cache Access • Speculatively send L1 requests to the L2 cache • Speculatively access both tag and data in the L2 cache at the same time (rather than serially) • Turned on independently or in combination • Approximately linear power-performance relationship • Benefits applications limited by L1 cache capacity miss Lower Hierarchy Level miss Lower Hierarchy Level L1 Cache L2 Cache Tag Load hit L1 Miss Tag Data Data L2 Cache L2 Cache WEED2010
Performance Boosting Techniques • Boosting main memory performance • CLEAR [N. Kirman et al, HPCA 2005] • Predict and speculatively retire long latency loads • Supply predicted values to destination registers • Free processor resources for non-dependent instructions • Linear power-performance relationship • Benefits memory bound applications WEED2010
Performance Boosting Techniques • DVFS • Scale up voltage and frequency • Already built in • Cubic power cost for linear performance benefit • Benefits high-IPC applications WEED2010
Comparison of Boosting Techniques Performance Improvement WEED2010
Two-step approach Transfer power Harness Power Architecture Back End (BE) Back End (BE) Front End (FE) Front End (FE) Load-Store Queue (LSQ) Load-Store Queue (LSQ) Core 1 Core 2 WEED2010
Power Transfer Runtime Manager • Periodically coordinate chip-wide effort to relocate power among cores • Obtain current local hardware deconfiguration status (due to faults) • Determine additional components to be deconfigured • Transfer power to one or more mechanisms that make best use of it WEED2010
Power Transfer Runtime Manager Sampling Phase Sample deconfigurations Choose additional deconfiguration Steady Phase Local decisions Sample performance boosting Compute global throughput with fairness Choose best 4-core configuration Global Decisions Apply DVFS (greedy) WEED2010
Global vs Local Optimization • 100 4-core configurations, random errors and random SPEC CPU2000 benchmarks 22.2% Speedup 10.0% WEED2010
Diversity of Boosting Techniques • 100 4-core configurations, random errors and random SPEC CPU2000 benchmarks 22.2% Speedup 6.3% WEED2010
Power Transfer Runtime Manager • 100 4-core configurations, random errors and random SPEC CPU2000 benchmarks 22.2% Speedup 15.3% 10.0% 6.3% WEED2010
Conclusions • We proposed a technique to increase performance given a certain power budget in the presence of hard faults • Exploited the deconfiguration capabilities already built in microprocessors • Demonstrated that pipeline imbalances and additional deconfiguration are application-dependent • Proposed several boosting techniques • Demonstrated the potential for substantial performance gains for a 4-core CMP WEED2010
Future Work • Heuristic approaches to scale this problem to many cores • Simulated Annealing, Genetic Algorithm • Pareto optimal fronts to reduce the number of combinations • Hierarchical optimization WEED2010