260 likes | 403 Views
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction. Yan Lin 1 , Yu Hu 1 , Lei He 1 and Vijay Raghunathan 2 1 EE Department, UCLA 2 Purdue University Partially supported by NSF. Address comments to lhe@ee.ucla.edu. Outline.
E N D
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin1, Yu Hu1, Lei He1 and Vijay Raghunathan2 1EE Department, UCLA 2Purdue University Partially supported by NSF. Address comments to lhe@ee.ucla.edu
Outline • Background, Motivation and Problem Formulation • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] • Network Flow Based Vdd Level Assignment Formulation • Experimental Results • Conclusions
Background • Existing FPGAs are power inefficient compared to ASICs. • Interconnect is the dominant component of FPGA power dissipation (dynamic and leakage).[Li, TCAD‘05] • Power aware FPGA architectures and CAD algorithms have been studied extensively. • CAD algorithms to minimize power-delay product[Lamoureux, ICCAD’03] • Configuration inversion for leakage reduction[Anderson, FPGA’04] • Vdd-programmable FPGA logic blocks [Li, FPGA’04] [Li, DAC’04] • Vdd-programmable FPGA interconnects [Li, ICCAD’04] [Gayasen, FPL’04] [Anderson, ICCAD’04] [Lin, DAC’05]
Vdd Programmable Interconnect Arch. • Island style and mixed wire segment length. • Routing switch/connection block(Two PMOS power transistors M3 and M4 are inserted between the tri-state buffer and VddH, VddL power rails, respectively.) [Li, ICCAD’04] • Level converter free in routing tree(Guarantee that no VddL switch drives VddH switches.) with LEAST area and power penalty[Lin, TCAD’06].
Limitation of Existing Approaches • Uniform wire segment length was assumed, and cannot be extended to mixed wire segment directly. • LP based formulation is timing consuming and computational instable. Time consuming: runtime goes up quickly for large circuit Computational instability: small size circuit uses long runtime
Problem Formulations [ Dual-Vdd Level Assignment Problem ] Given: placement and routing results of a FPGA design Find: A Vdd-level assignment to each interconnect switch Objective: Minimize interconnect (dynamic and leakage) power Constraints: • Meet the delay target Tspec • Vdd-level converters are inserted ONLY at CLB inputs/outputs
Outline • Background, Motivation and Problem Formulation • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] • Interconnect Power Reduction Estimation • LP Based Vdd-level Assignment Algorithm • Network Flow Based Vdd Level Assignment Formulation • Experimental Results • Conclusions
Delay and Power Model for Interconnect • Delay Model • Intrinsic delay and effective driving resistance of switch has been pre-characterized using SPICE. • Elmore delay is used to calculate routing delay. • Interconnect Power Model • Dynamic power Pd(Vddjj)=0.5fclk*C*Vddjj2 • Leakage power Pl(Vddjj) is pre-characterized using SPICE • Interconnect power reduction estimation is the essential part of dual-Vdd assignment algorithm.
VddL possibility for switches Power reduction estimation Vdd assignment base on estimation Timing Slack assigned at sinks b4 b4 b3 b3 b1 b1 b2 b2 S1=1 S1=1 S2=3 S2=3 Review of Vdd Level Assignment Algorithm[Lin, DAC'05] Interconnect power reduction estimation Problem remained: How to calculate VddL possibility for mixed wire segment? The net-level bottom-up Vdd assignment guarantees the legalization of final solutions. [Lin, DAC’05] Leverage all extra slack with VddL switches [Lin, DAC’05]
b4, 16x b1, 8x b3, 16x S1=6 b2, 8x S2=10 VddL Possibility Calculation • Represent timing slack in number of switches: • si = Li * ( Si / Di ) • si is the number of VddL switches can be inserted in the path from source to jth sink in the routing tree. • Li is the number of switches along this path. • si: how many switches can be turned to VddL along source-to-sink-i path for the given timing slack Si. • VddL possiblity for switch j at sink i based on load capacity: • f(i,j) = si* (cij / Ci) • Key idea: distribute timing slack to each switch based on cap. f(2,2) = 1 f(2,3) = 1 f(2,4) = 1/2 L2 = 3 D2 = 12 s2 = 3*(10/12)=5/2
Power Reduction Estimation for Mixed Wire Segments • The lower bound estimation [Y. Lin, DAC'05] for interconnect power reduction is no longer valid for mixed wire segments. • Our solution: develop the upper bound estimation of VddL switch number • Consistent upper bound of power reduction • Remove the non-linear term "min" and the corresponding extra LP constraints from lower bound estimation 1.7 slack left -1.8 needed! Only 1.0 VddL switch assignment b1, 16x, need 1.8 slack fn(i,1) = 0.9 fn(i,2) = 0.5 lower bound of VddL switches = 0.9 + .5 = 1.4 b2, 8x, need 1.0 slack Consume 1.0 S = 2.7 S = 2.7 Problem here: Lower bound > actual number! Sum up all VddL possibility
Dynamic power reduction upper bound Leakage power reduction upper bound LP formulation for dual-Vdd Level Assignment • Basic timing constraints • Slack constraints • Objective function Arrival time for prim-output Arrival time for prim-input Arrival time constraints Slack upper bound Slack constraints Slack non-negative
Outline • Motivation • Problem Formulations • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] • Network Flow Based Vdd Level Assignment Formulation • Overview of network flow based timing slack budgeting • Primal-dual reformulation • Experimental Results • Conclusions
Network Flow Based Timing Slack Budgeting • Motivated by [Ghiasi, ICCAD’04] for logic level optimization • Step 1: Reorganize objective function: • Step 2: Eliminate timing slack variables (by substitution):
Network Flow Based Timing Slack Budgeting (cont.) • Step 3: Reorganize objective function by timing nodes: • Step 4: Generate dual-problem: Constant terms, remove Constant coefficients Node by node Edge by edge Edge by edge Node by node
-9 -9 -2/2 -3/3 -6 -2/2 -7 -4 -1 -3 -3 -4 -1 -2 -1/1 0 0 9 0 -9 Link Induced Network from Timing Graph Flow in backward arch (dot segments) Flow in forward arch (solid segments) • No negative weight cycle exists in the induced network. A min-cost flow can be found for sure! • A shortest path based algorithm is used to produce the solution for primal problem. Demand in node i
Outline • Motivation • Problem Formulations • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] • Network Flow Based Formulation • Experimental Results • Conclusions
Experimental Setting • Cluster-based Island Style FPGA Structure • Size-10 cluster and size-4 LUT • 100% buffered interconnects, subset switch block • 60% length-4 and 40% length-8l wire segments • 25x buffer for length-4 and 10x buffer for length-8 • ITRS 100nm technology, 1.3v for VddH and 0.8v for VddL • Use VPR [Betz-Rose-Marquardt] for placement and routing • Use fpgaEva-LP2 [Lin et al, FPGA’05] for power calculation • Considering short-circuit power, glitch power and input vector • 8% average error compared to SPICE simulation • 20 biggest sequential MCNC benchmarks are tested • Use LPsolver to solve LP
Dual-Vdd Assignment for FPGAs with Mixed Wire Segments • Both LP-basedand Netflow-basedalgorithm achieves 85% VddL assignment on average.
Interconnect Power Reduction 52% total interconnect power reduction is achieved!
Runtime comparison More significant speedup is expected for larger circuits. Netflow based algorithm gets consistent speedup and stable runtime
Outline • Motivation • Problem Formulations • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments] • Network Flow Based Formulation • Experimental Results • Conclusions
Conclusions • A min-cost network flow based timing budgeting formulation which speedups up the budgeting procedure and the overall design flow up to 6000x and 20x, respectively, compared to LP based one. • Both chip-level dual-Vdd assignment algorithms are for mixed length wire segment. Experimental results show an interconnect power reduction of 53% on average compared to single-Vdd FPGA designs.
Thank you! Q/A