1 / 24

An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction

An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction. Yan Lin 1 , Yu Hu 1 , Lei He 1 and Vijay Raghunathan 2 1 EE Department, UCLA 2 Purdue University Partially supported by NSF. Address comments to lhe@ee.ucla.edu. Outline.

Download Presentation

An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin1, Yu Hu1, Lei He1 and Vijay Raghunathan2 1EE Department, UCLA 2Purdue University Partially supported by NSF. Address comments to lhe@ee.ucla.edu

  2. Outline • Background, Motivation and Problem Formulation • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] • Network Flow Based Vdd Level Assignment Formulation • Experimental Results • Conclusions

  3. Background • Existing FPGAs are power inefficient compared to ASICs. • Interconnect is the dominant component of FPGA power dissipation (dynamic and leakage).[Li, TCAD‘05] • Power aware FPGA architectures and CAD algorithms have been studied extensively. • CAD algorithms to minimize power-delay product[Lamoureux, ICCAD’03] • Configuration inversion for leakage reduction[Anderson, FPGA’04] • Vdd-programmable FPGA logic blocks [Li, FPGA’04] [Li, DAC’04] • Vdd-programmable FPGA interconnects [Li, ICCAD’04] [Gayasen, FPL’04] [Anderson, ICCAD’04] [Lin, DAC’05]

  4. Vdd Programmable Interconnect Arch. • Island style and mixed wire segment length. • Routing switch/connection block(Two PMOS power transistors M3 and M4 are inserted between the tri-state buffer and VddH, VddL power rails, respectively.) [Li, ICCAD’04] • Level converter free in routing tree(Guarantee that no VddL switch drives VddH switches.) with LEAST area and power penalty[Lin, TCAD’06].

  5. Limitation of Existing Approaches • Uniform wire segment length was assumed, and cannot be extended to mixed wire segment directly. • LP based formulation is timing consuming and computational instable. Time consuming: runtime goes up quickly for large circuit Computational instability: small size circuit uses long runtime

  6. Problem Formulations [ Dual-Vdd Level Assignment Problem ] Given: placement and routing results of a FPGA design Find: A Vdd-level assignment to each interconnect switch Objective: Minimize interconnect (dynamic and leakage) power Constraints: • Meet the delay target Tspec • Vdd-level converters are inserted ONLY at CLB inputs/outputs

  7. Outline • Background, Motivation and Problem Formulation • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] • Interconnect Power Reduction Estimation • LP Based Vdd-level Assignment Algorithm • Network Flow Based Vdd Level Assignment Formulation • Experimental Results • Conclusions

  8. Delay and Power Model for Interconnect • Delay Model • Intrinsic delay and effective driving resistance of switch has been pre-characterized using SPICE. • Elmore delay is used to calculate routing delay. • Interconnect Power Model • Dynamic power Pd(Vddjj)=0.5fclk*C*Vddjj2 • Leakage power Pl(Vddjj) is pre-characterized using SPICE • Interconnect power reduction estimation is the essential part of dual-Vdd assignment algorithm.

  9. VddL possibility for switches Power reduction estimation Vdd assignment base on estimation Timing Slack assigned at sinks b4 b4 b3 b3 b1 b1 b2 b2 S1=1 S1=1 S2=3 S2=3 Review of Vdd Level Assignment Algorithm[Lin, DAC'05] Interconnect power reduction estimation Problem remained: How to calculate VddL possibility for mixed wire segment? The net-level bottom-up Vdd assignment guarantees the legalization of final solutions. [Lin, DAC’05] Leverage all extra slack with VddL switches [Lin, DAC’05]

  10. b4, 16x b1, 8x b3, 16x S1=6 b2, 8x S2=10 VddL Possibility Calculation • Represent timing slack in number of switches: • si = Li * ( Si / Di ) • si is the number of VddL switches can be inserted in the path from source to jth sink in the routing tree. • Li is the number of switches along this path. • si: how many switches can be turned to VddL along source-to-sink-i path for the given timing slack Si. • VddL possiblity for switch j at sink i based on load capacity: • f(i,j) = si* (cij / Ci) • Key idea: distribute timing slack to each switch based on cap. f(2,2) = 1 f(2,3) = 1 f(2,4) = 1/2 L2 = 3 D2 = 12 s2 = 3*(10/12)=5/2

  11. Power Reduction Estimation for Mixed Wire Segments • The lower bound estimation [Y. Lin, DAC'05] for interconnect power reduction is no longer valid for mixed wire segments. • Our solution: develop the upper bound estimation of VddL switch number • Consistent upper bound of power reduction • Remove the non-linear term "min" and the corresponding extra LP constraints from lower bound estimation 1.7 slack left -1.8 needed! Only 1.0 VddL switch assignment b1, 16x, need 1.8 slack fn(i,1) = 0.9 fn(i,2) = 0.5 lower bound of VddL switches = 0.9 + .5 = 1.4 b2, 8x, need 1.0 slack Consume 1.0 S = 2.7 S = 2.7 Problem here: Lower bound > actual number! Sum up all VddL possibility

  12. Dynamic power reduction upper bound Leakage power reduction upper bound LP formulation for dual-Vdd Level Assignment • Basic timing constraints • Slack constraints • Objective function Arrival time for prim-output Arrival time for prim-input Arrival time constraints Slack upper bound Slack constraints Slack non-negative

  13. Outline • Motivation • Problem Formulations • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] • Network Flow Based Vdd Level Assignment Formulation • Overview of network flow based timing slack budgeting • Primal-dual reformulation • Experimental Results • Conclusions

  14. Network Flow Based Timing Slack Budgeting • Motivated by [Ghiasi, ICCAD’04] for logic level optimization • Step 1: Reorganize objective function: • Step 2: Eliminate timing slack variables (by substitution):

  15. Network Flow Based Timing Slack Budgeting (cont.) • Step 3: Reorganize objective function by timing nodes: • Step 4: Generate dual-problem: Constant terms, remove Constant coefficients Node by node Edge by edge Edge by edge Node by node

  16. -9 -9 -2/2 -3/3 -6 -2/2 -7 -4 -1 -3 -3 -4 -1 -2 -1/1 0 0 9 0 -9 Link Induced Network from Timing Graph Flow in backward arch (dot segments) Flow in forward arch (solid segments) • No negative weight cycle exists in the induced network. A min-cost flow can be found for sure! • A shortest path based algorithm is used to produce the solution for primal problem. Demand in node i

  17. Outline • Motivation • Problem Formulations • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments, Hu et al, DAC’06] • Network Flow Based Formulation • Experimental Results • Conclusions

  18. Experimental Setting • Cluster-based Island Style FPGA Structure • Size-10 cluster and size-4 LUT • 100% buffered interconnects, subset switch block • 60% length-4 and 40% length-8l wire segments • 25x buffer for length-4 and 10x buffer for length-8 • ITRS 100nm technology, 1.3v for VddH and 0.8v for VddL • Use VPR [Betz-Rose-Marquardt] for placement and routing • Use fpgaEva-LP2 [Lin et al, FPGA’05] for power calculation • Considering short-circuit power, glitch power and input vector • 8% average error compared to SPICE simulation • 20 biggest sequential MCNC benchmarks are tested • Use LPsolver to solve LP

  19. Dual-Vdd Assignment for FPGAs with Mixed Wire Segments • Both LP-basedand Netflow-basedalgorithm achieves 85% VddL assignment on average.

  20. Interconnect Power Reduction 52% total interconnect power reduction is achieved!

  21. Runtime comparison More significant speedup is expected for larger circuits. Netflow based algorithm gets consistent speedup and stable runtime

  22. Outline • Motivation • Problem Formulations • Chip-level Vdd-level Assignment Algorithm [for mixed length wire segments] • Network Flow Based Formulation • Experimental Results • Conclusions

  23. Conclusions • A min-cost network flow based timing budgeting formulation which speedups up the budgeting procedure and the overall design flow up to 6000x and 20x, respectively, compared to LP based one. • Both chip-level dual-Vdd assignment algorithms are for mixed length wire segment. Experimental results show an interconnect power reduction of 53% on average compared to single-Vdd FPGA designs.

  24. Thank you! Q/A

More Related