Dynamic Power Redistribution in Failure-Prone CMPs

Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter* and David H. Albonesi Cornell University *Google, Inc.

Motivation • Hardware failures expected to become prominent in future generations Back End (BE) Front End (FE) Load-Store Queue (LSQ) Core WEED2010

Motivation • Deconfiguration tolerates defects at the expense of performance • Pipeline imbalance • Units correlated with deconfigured one might become overprovisioned • Power inefficiencies • Application specific Back End (BE) Front End (FE) ? ? Load-Store Queue (LSQ) Core WEED2010

Research Goal • Given a CMP with a set of failures and a power budget: • Eliminate power inefficiencies • Improve performance WEED2010

Outline • Motivation • Architecture • Power Harnessing • Performance Boosting • Power Transfer Runtime Manager • Conclusions and future work WEED2010

Two-step approach Transfer power Harness Power Architecture Back End (BE) Back End (BE) Front End (FE) Front End (FE) Load-Store Queue (LSQ) Load-Store Queue (LSQ) Core 1 Core 2 WEED2010

Power Harnessing BE RF BPred Decode/ Rename IQ FQ Dispatch I-Cache Select ROB FE LSQ D-Cache WEED2010

Pipeline Imbalance Performance Loss Power Saved WEED2010

Performance Boosting • Distribute accumulated margin of power to boost performance • Temporarily enable a previously dormant feature • Requirements • Small area and fast power-up • Small PPR (Power-Performance Ratio) WEED2010

Performance Boosting Techniques • Speculative Cache Access • Speculatively send L1 requests to the L2 cache • Speculatively access both tag and data in the L2 cache at the same time (rather than serially) • Turned on independently or in combination • Approximately linear power-performance relationship • Benefits applications limited by L1 cache capacity miss Lower Hierarchy Level miss Lower Hierarchy Level L1 Cache L2 Cache Tag Load hit L1 Miss Tag Data Data L2 Cache L2 Cache WEED2010

Performance Boosting Techniques • Boosting main memory performance • CLEAR [N. Kirman et al, HPCA 2005] • Predict and speculatively retire long latency loads • Supply predicted values to destination registers • Free processor resources for non-dependent instructions • Linear power-performance relationship • Benefits memory bound applications WEED2010

Performance Boosting Techniques • DVFS • Scale up voltage and frequency • Already built in • Cubic power cost for linear performance benefit • Benefits high-IPC applications WEED2010

Comparison of Boosting Techniques Performance Improvement WEED2010

Two-step approach Transfer power Harness Power Architecture Back End (BE) Back End (BE) Front End (FE) Front End (FE) Load-Store Queue (LSQ) Load-Store Queue (LSQ) Core 1 Core 2 WEED2010

Power Transfer Runtime Manager • Periodically coordinate chip-wide effort to relocate power among cores • Obtain current local hardware deconfiguration status (due to faults) • Determine additional components to be deconfigured • Transfer power to one or more mechanisms that make best use of it WEED2010

Power Transfer Runtime Manager Sampling Phase Sample deconfigurations Choose additional deconfiguration Steady Phase Local decisions Sample performance boosting Compute global throughput with fairness Choose best 4-core configuration Global Decisions Apply DVFS (greedy) WEED2010

Global vs Local Optimization • 100 4-core configurations, random errors and random SPEC CPU2000 benchmarks 22.2% Speedup 10.0% WEED2010

Diversity of Boosting Techniques • 100 4-core configurations, random errors and random SPEC CPU2000 benchmarks 22.2% Speedup 6.3% WEED2010

Power Transfer Runtime Manager • 100 4-core configurations, random errors and random SPEC CPU2000 benchmarks 22.2% Speedup 15.3% 10.0% 6.3% WEED2010

Conclusions • We proposed a technique to increase performance given a certain power budget in the presence of hard faults • Exploited the deconfiguration capabilities already built in microprocessors • Demonstrated that pipeline imbalances and additional deconfiguration are application-dependent • Proposed several boosting techniques • Demonstrated the potential for substantial performance gains for a 4-core CMP WEED2010

Future Work • Heuristic approaches to scale this problem to many cores • Simulated Annealing, Genetic Algorithm • Pareto optimal fronts to reduce the number of combinations • Hierarchical optimization WEED2010

Questions?

Dynamic Power Redistribution in Failure-Prone CMPs

Dynamic Power Redistribution in Failure-Prone CMPs

Presentation Transcript

Labor Redistribution Please Sign In

7. Income Redistribution

DYNAMIC POWER MANAGEMENT IN WIRELESS SENSOR NETWORK

A Programming model for failure-prone, Collaborative robots

Thread criticality for power efficiency in CMPs

Prone Shoulder Touches

ROME Power and Failure

RIDING REDISTRIBUTION PROCESS

INCOME REDISTRIBUTION

Power cut or power failure?

Redistribution

Beyond Redistribution

Income redistribution

Hydraulic redistribution in Amazonian trees

Availability in CMPs

Wealth Redistribution Policies

Dynamic power management

Prone Position Ventilation

CMPS 115 - Requirements

Building in bushfire prone areas

Knowledge/Power Dynamic (graded)

Income redistribution