370 likes | 481 Views
Dagstuhl Seminar – Power-aware Computing Systems. Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS). Gaurav Singh, Sandeep K. Shukla, FERMAT Lab, Virginia Tech. Talk Outline. CAOS. Formalization Schedule. Cost of a schedule (in terms of its power).
E N D
Dagstuhl Seminar – Power-aware Computing Systems. Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS) Gaurav Singh, Sandeep K. Shukla, FERMAT Lab, Virginia Tech.
Talk Outline • CAOS. • Formalization • Schedule. • Cost of a schedule (in terms of its power). • Power Problems ( Peak and Dynamic ). • Low Power Strategies. FERMAT / Virginia Tech
Bank Account Example • Process 0 increments register x • Process 1 transfers a unit from register x to register y • Process 2 decrements register y • This is an abstraction of some real applications: • Bank account: 0 = deposit to checking, 1 = transfer from checking to savings, 2 = withdraw from savings • Packet processor: 0 = packet arrives, 1 = packet is processed, 2 = packet departs • … 0 2 1 +1 -1 +1 -1 x y FERMAT / Virginia Tech
0 2 1 +1 -1 +1 -1 x y Concurrency in the example • Process j (= 0,1,2) only updates under condition condj • Only one process at a time can update a register. Note: • Process 0 and 2 can run concurrently if process 1 is not running • Both of process 1’s updates must happen “indivisibly” (else inconsistent state) • Suppose we want to prioritize process 2 over process 1 over process 0 cond0cond1cond2 Process priority: 2 > 1 > 0 FERMAT / Virginia Tech
0 2 1 +1 -1 +1 -1 x y Find the error: cond0cond1cond2 Process priority: 2 > 1 > 0 Process priority: 2 > 1 > 0 always @(posedge CLK) begin if (!cond2 || cond1) x <= x – 1; else if (cond0) x <= x + 1; if (cond2) y <= y – 1; else if (cond1) y <= y + 1; end always @(posedge CLK) begin if (!cond2 && cond1) x <= x – 1; else if (cond0) x <= x + 1; if (cond2) y <= y – 1; else if (cond1) y <= y + 1; end Which of these solutions are correct, if any? FERMAT / Virginia Tech
0 2 1 +1 -1 +1 -1 x y CAOS design cond0cond1cond2 cond0 cond1 cond2 Process priority: 2 > 1 > 0 Process priority: 2 > 1 > 0 (* descending_urgency = “proc2, proc1, proc0” *) action proc0 (cond0); x <= x + 1; end action proc1 (cond1); y <= y + 1; x <= x – 1; end action proc2 (cond2); y <= y – 1; end FERMAT / Virginia Tech
Concurrent Action Oriented Specifications – CAOS • Hardware design described in terms of atomic actions. • Each action consists of Guard and Body. Example - Action proc1 ( x>1 ) { y = y + 1; x = x - 1; } • Bluespec Compiler – High level Synthesis tool. FERMAT / Virginia Tech
0 2 1 +1 -1 +1 -1 x y Possible Schedule cond0cond1cond2 cond0 cond1 cond2 Process priority: 2 > 1 > 0 Process priority: 2 > 1 > 0 (* descending_urgency = “proc2, proc1, proc0” *) action proc0 (cond0); x <= x + 1; end action proc1 (cond1); y <= y + 1; x <= x – 1; end action proc2 (cond2); y <= y – 1; end Clock cycle cond0 cond0 cond1 cond2 cond0 proc0 proc1 proc2 proc0 Possible Schedule – one action in each clock cycle FERMAT / Virginia Tech
Clock cycle cond0 cond0 cond1 cond2 cond0 proc0 proc1 proc2 proc0 Possible Schedule cond0 cond0 cond1 cond2 proc2 proc0 proc1 proc0 CAOS Schedule CAOS Schedule (* descending_urgency = “proc2, proc1, proc0” *) action proc0 (cond0); x <= x + 1; end action proc1 (cond1); y <= y + 1; x <= x – 1; end action proc2 (cond2); y <= y – 1; end Process priority: 2 > 1 > 0 proc0 and proc2 do not conflict FERMAT / Virginia Tech
CAOS Semantics • Multiple non-conflicting actions can execute concurrently as long as concurrent behavior corresponds to at least one sequential ordering (may lead to high dynamic power and peak power consumption.) • Example with anti-dependency - Action a1 (true) : x = y + 1; Action a2 (true) : y = y + 1; • Concurrent execution corresponds to – a1 followed by a2. (since a2 updates y and a1 uses y – Anti-dependency) FERMAT / Virginia Tech
Sequential Ordering a3 – a2 – a1– a5 – a4 Sequential Schedule Clock cycle a1 a4 a2 a5 a3 Original Schedule FERMAT / Virginia Tech
Synthesis from CAOS. FERMAT / Virginia Tech
Formalization. • Consider a design – • ŝ= { s1, s2,…, sk } : Set of k state elements. • σ(ŝ) : State of the design at some point. • A = {a1, a2,…, an} : Set of n actions of a design. • wi : Weight of an action ai ЄA. • Dependency (di, j) : An action ai is dependent on an action aj if any state accessed by ai is updated by aj. FERMAT / Virginia Tech
Feasible Schedule • Consider original scheduleβ = {A1, A2,…, An …} where Aic A execute in clock cycle i. • If Ai= {ai1, ai2, …, aim } then • aij, aik Є Ai , aij and aik do not conflict. • if the concurrent execution of the actions in Aitransforms the design from a state σ(ŝ) toσ’(ŝ), then there exists a corresponding permutation (sequential ordering) that also transforms the design from a state σ(ŝ) toσ’(ŝ). FERMAT / Virginia Tech
Cost of Schedule. • Costs for a schedule β = {A1, A2,…, An …} – • Peak Power: • Dynamic Power: where Pi, i+1 is the switching power expended in moving from Ai to Ai+1. • Low Power Goal – Given β, create a new schedule α such that Ppeak(α) < Ppeak(β) and/or Pswitch(α) < Pswitch(β). FERMAT / Virginia Tech
Peak Power Problem. • G - Maximal set of actions enabled in a clock cycle c. • Ppeak – Maximum Allowable Peak Power. • fi = 1 if the action ai is executed in clock cycle c, otherwise fi = 0. • Peak power minimization problem– for each clock cycle under the following constraint – • d FERMAT / Virginia Tech
Low Power Strategies. • Re-scheduling -Targets the power minimization in a design by re-scheduling the execution of various actions – • Uses sequential ordering for re-scheduling. • Factorizing and Re-scheduling -Targets the power minimization in a design by factorizing one or more actions of the design into lower granularity parts and re-scheduling these parts for power savings. FERMAT / Virginia Tech
Clock cycle Clock cycle a1 a1 a4 a4 a2 a2 a5 a5 a3 a3 Peak Power Reduction -1 • Use Re-scheduling - Actions can be re-scheduled based on the ordering to meet the peak power goal. Original Schedule Low Power Schedule FERMAT / Virginia Tech
Clock cycle Clock cycle a1 a1 a4 a4 a2 a2 a5 a5 a3 a3 Functional Equivalence a3 – a2 – a1– a4 – a5 Sequential Schedule Original Schedule Low Power Schedule FERMAT / Virginia Tech
Peak Power Problem – Versions. • Version 1 – Actions have to be chosen based on sequential ordering. • Version 2 - Any action can be chosen with each action having same profit – • Order actions based on their weights (power consumed). • Version 3 – Any action can be chosen with each action having different profit – • Corresponds to 0/1 Knapsack problem - NP-Complete. FERMAT / Virginia Tech
Factorizing an Action • Factorization - Larger action a can be factorized into parts a1and a2 each of which can execute in consecutive clock cycles to meet the peak power constraint. • Constraints - • Atomicity should be maintained – If a1 is accessing state updated by a2then a1 should execute before a2. • Dependencies with other actions should be maintained. FERMAT / Virginia Tech
Clock cycle Clock cycle a1 a1-2 a1-1 a4 a2 a2 a4 a5 a3 a3 a5 Peak Power Reduction - 2 • Use Factorization– Factorized parts can be re-scheduled in consecutive clock cycles based on the dependency constraints. Original Schedule Low Power Schedule FERMAT / Virginia Tech
Low Power Synthesis from CAOS. Main Issue - How to efficiently re-schedule actions in real hardware ? FERMAT / Virginia Tech
Dynamic Power Problem • Dynamic power minimization problem – • Select the most power efficient ordering of execution of actions. • NP-Complete (Travelling Salesman Problem) –Given a weighted directed graph G = (V, E) find a path with the least weight that includes every vertex of set V exactly once. • Sub-problem to the Dynamic Power Problem. FERMAT / Virginia Tech
Low Dynamic Power – Re-scheduling • Re-scheduling of Actions – • Actions are re-scheduled such that switching at the inputs of the functional units is minimized. • Resource sharing - Conflicts are created such that same functional units can be re-used to avoid switching. FERMAT / Virginia Tech
Low Dynamic Power – Operand Isolating a single action action foo (… cond … (x < y) …); x <= x + z … endrule Computations stay quiescent except when action executes, i.e. guard is True x x’ action foo y y’ next-state values Φ2 z z’ next state Q D body logic current state EN cond logic enablesignals FERMAT / Virginia Tech
D Q Enable Operand Isolating multiple actions Isolating multiple actions of a design. Rule1 Rule Control State DataSelect RuleN Φ2 Action1 ΦN ActionN Cond1 Scheduler CondN FERMAT / Virginia Tech
Low Dynamic Power – Register-level clock gating • Register-level clock gating – • Registers having a common ENABLE signal can be provided the same gated clock. • Prevents unnecessary switching in the registers. • CAOS - Registers being updated in a body of an actions are gated using the guard of the action. • Implemented algorithm in Bluespec Compiler saved • Operand Isolation - upto 20% dynamic power. • Clock-gating – upto 26% dynamic power. FERMAT / Virginia Tech
Thank You !! ? FERMAT / Virginia Tech
Low Dynamic Power – Operand Isolation • Operand Isolation - • In order to save power, computation corresponding to the body of an action is allowed only when its output is used in the present clock cycle. • Involves - • Insertion of gates at the appropriate points without affecting guards. • Selection of activation signal. • Guards of actions used as gating signals. • Implemented algorithm in Bluespec Compiler saved upto 20% dynamic power [3]. FERMAT / Virginia Tech
Implementation (Ongoing) • Control circuitry needed to decide which actions execute in each clock cycle – Will consume extra power if implemented in hardware. • How can this extra power consumption be avoided? • Create extra conflicts among the actions. • Analysis required to decide what conflicts to add FERMAT / Virginia Tech
Side Effects - Latency • Re-scheduling for power minimization may degrade Latency. • µ - Maximum latency degradation factor for re-scheduling. • Corresponding Average Peak Power constraint can be estimated as - FERMAT / Virginia Tech
Operand Isolation– 1Using register/latch for frequently enabled actions Maximum quiescence – good if actions alternate on-off, e.g. arbiter Phase 2 edge-triggered register OR phase 2 transmitting latch action foo next-state values current state Φ2 D body logic EN cond logic enablesignals FERMAT / Virginia Tech
Operand Isolation–2Using AND gate for infrequently enabled actions Optimal area – great if actions stay unenabled for multiple cycles, e.g. FSM rules OR opcodes in a controller/processor AND gate action foo next-state values current state D body logic EN cond logic enablesignals FERMAT / Virginia Tech
Automatic Clock-gating of Registers Registers having common ENABLE signals (updated by same set of actions) can be supplied the same gated clock. EN Register QOUT DIN CLK FERMAT / Virginia Tech
Automatic Clock-gating of Registers In CAOS, guards of the actions provide the control for gating the clocks of the registers. CLK Register DIN EN QOUT GATED_CLK GATED_CLK EN CLK FERMAT / Virginia Tech
Publications • G. Singh and S. K. Shukla, “Algorithms for Low Power Hardware Synthesis from CAOS - Concurrent Action Oriented Specifications” -Special Issue of International Journal of Embedded Systems (IJES’06). • G. Singh and S. K. Shukla, “Low-Power Hardware Synthesis from TRS-based Specifications” - MEMOCODE’06. • G. Singh, J.Schwartz, S.Ahuja and S. K. Shukla, “Techniques for Power-aware Synthesis from Concurrent Action Oriented Specifications” – Submitted to DAC’07. FERMAT / Virginia Tech