1 / 30

Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction

Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction. Yan Lin, Fei Li and Lei He EE Department, UCLA Partially supported by NSF grant CCR-0306682. Address comments to lhe@ee.ucla.edu. Outline. Review and Motivation

Download Presentation

Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction Yan Lin, Fei Li and Lei He EE Department, UCLA Partially supported by NSF grant CCR-0306682. Address comments to lhe@ee.ucla.edu.

  2. Outline • Review and Motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • Conclusions and Ongoing Work

  3. Power Limitation of FPGAs • Existing FPGAs are HIGHLY power inefficient (> 100X more than ASIC) • E.g. [Kusse, ISLPED’98] • Power is likely the largest limitation for FPGAs

  4. FPGA Power Reduction • Power aware FPGA CAD algorithms for existing FPGA architectures • CAD algorithms to minimize power-delay product [Lamoureux et al, ICCAD’03] • Configuration inversion for leakage reduction [Anderson et al, FPGA’04] • Power efficient FPGA circuits and architectures • Dual-Vdd and Vdd-programmable FPGA logic blocks [Li et al, FPGA’04][Li et al, DAC’04] • Vdd-programmable FPGA interconnects • [Li et al, ICCAD’04] • [Anderson et al, ICCAD’04]

  5. Overall FPGA Structure • Cluster-based Island Style FPGA Structure • Logic blocks are embedded into routing resources • Wire segment connectivity is programmable

  6. FPGA Routing Structure • Subset Programmable switch block • An incoming track can be connected to different outgoing tracks with the same track number • Programmable connection block

  7. Vdd-programmable Interconnects [Li et al, ICCAD’04] • Conventional routing switch • Vdd-programmable switch • Vdd selection for used switch • Power-gating unused switch • Configurable Vdd-level conversion • Avoid excessive leakage when low Vdd switch drives high Vdd switches Power transistor

  8. Limitation of Vdd-programmable Interconnects [Li et al, ICCAD’04] • Fine-grained Vdd-level converter insertion • Area overhead • 54% area overhead for circuit s38584 • Leakage overhead • 36% leakage overhead for circuit s38584 • SRAM cell overhead • 300% SRAM cell overhead for each switch • Area/SRAM efficient low-power interconnects are needed

  9. Outline • Review and Motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • Conclusions and Ongoing Work

  10. Low Utilization Rate of Interconnects • 78.15% of total power is consumed by global interconnect power [Li et al, DAC’04] • 47% of global interconnect power is leakage • Why? • Extremely low utilization rate (~12% w/ minimum array)

  11. Interconnect Utilization Rate is Intrinsically Low • Programmable switch block • no more than 25% • Programmable connection block • Only one is used (for 64 tracks) • Power-gating unused interconnects is necessary

  12. Vdd-gateable Routing Switch • Conventional routing switch • Vdd-gateable routing switch • Only two states for a routing switch • High Vdd • Power-gating • Enable power-gating capability w/o extra SRAM cells Power transitor

  13. Vdd-Gateable Connection Block • Conventional connection block • Vdd-gateable connection block • Enable power-gating capabilityw/ only one extra SRAM for a connection block • Only n+1 SRAM cells for 2n connection switches • A low leakage decoder is needed

  14. Power and Delay of Vdd-gateable Switch • Vdd-gateable switch compared to conventional switch • Dynamic power is almost the same • >300X leakage power reduction • ~6% delay increase

  15. Power Reduction by Power-gating Unused Interconnects Vdd-programmable interconnects Vdd-gateable interconnects

  16. Outline • Review and motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • FPGA fabrics and algorithms • Design flow and quantitative evaluation • Conclusions and Ongoing Work

  17. Pre-Defined Dual-Vdd Routing Architecture • Reduce dynamic power with dual-Vdd by making use of timing slack • Partition routing channel into VddH and VddL regions • Vdd-gateable interconnect switch is used • Ratio of VddH/VddL track is an architectural parameter

  18. Ratio of VddH to VddL Track • Determine ratio using dual-Vdd assignment profile without considering layout constraint • Sensitivity-based dual-Vdd assignment • Assignment unit --- a routing tree • Power sensitivity --- ΔP/ ΔVdd • Power difference for a routing tree between VddH and VddL • Greedy algorithm --- sensitivity based • Initial: uniform VddH assignment • Procedure: assign VddL to routing tree with largest power sensitivity (but without increasing critical delay)

  19. Profile of Dual-Vdd Assignment • Assignment with no critical path delay increase (VddH:VddL=1.5v:1.0v) • Set the ratio of VddH/VddL track to 1:1

  20. Level Converter is NOT Needed B A • Wire segment can only be connected to another wire segment with the same track number via a subset switch block

  21. Level Converter is NOT Needed B A • Wire segment can only be connected to another wire segment with the same track number via a subset switch block • No level converter is needed in switch block

  22. Layout Constraint Due to Dual-Vdd • Dual-Vdd introduces performance degradation due to layout constraint • Insufficient routing resources for Vdd-matched routing trees • May introduce detours • Solutions • Vdd-programmable interconnects [Li et al, ICCAD’04] • Provide sufficient routing tracks for Vdd-matched routing trees • Control leakage by power-gating unused interconnects

  23. Arch Spec Double Channel width Delay/Power Model (dual-Vdd) Design Flow for Dual-Vdd Interconnects Tech Mapped Netlist (Single-Vdd) Timing Driven Layout (Single-Vdd) Dual-Vdd Assignment for Routing Trees Timing Driven Layout (Dual-Vdd) Power-gating Unused Switches Delay/Power Estimation Delay Power

  24. Dual-Vdd Routing Algorithm • Based on the maze routing algorithm in VPR • Modify the cost function • TotalCost(n): the cost of routing tree T through wire segment n to the target sink j • PathCostDv(n): the cost of the path from the current partial routing tree to wire segment n • ExpectedDv(n,j): the estimated cost from wire segment n to the target sink j • Matched(T,n): boolean function describing Vdd-matching status

  25. Outline • Review and motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • FPGA fabrics and algorithms • Quantitative evaluation • Conclusions and Ongoing Work

  26. arch-SV 1.5v arch-PV arch-PV+PG 1.3v arch-DV+PG(1.5W) 1.3v/1.0v 1.5v/0.8v 1.0v 0.9v 1.5v/0.8v 1.0v/0.8v 1.3v/1.0v 0.9v/0.8v 1.0v/0.8v 0.9v/0.8v 1.3v/0.9v 1.5v/0.8v 0.9v/0.8v 1.0v/0.8v Comparison of Low Power Architectures 0.27 0.22 power (watt) 0.17 0.12 Circuit: S38584 0.07 60 70 80 90 100 110 120 130 clock frequency (MHZ) • Dual-Vdd interconnects with fine-grained power gating • May have performance degradation due to layout constraint • Can reduce more power than purely power-gating unused switches • Achieve 9.78% interconnect dynamic power reduction, 38.68% total power saving with 1.5W channel width • W is the nominal routing channel width in single-Vdd FPGA

  27. 50% 1 power saving 0.955 0.95 normalized clock frequency clock frequency 0.9 45% 0.838 0.85 power saving 45.00% 0.8 0.743 normalized clock frequency 40% 0.75 power saving 0.7 38.68% 0.65 35% 34.86% 0.6 0.55 30% 0.5 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 channel width Impact of Routing Channel Width • We get the power reduction percentage at the maximum clock frequency achieved by dual-Vdd interconnects • Channel width increases from 1.0W to 2.0W • Power saving increases from 34.86% to 45% • Normalized clock frequency increases from 0.743 to 0.955

  28. Area Overhead of Vdd-gateable Interconnects • Device area is dominant • Area overhead is mainly due to power transistors for power-gating capability • Track duplication with power-gating vs Vdd-programmable interconnects [Li et at, ICCAD’04] • More power reduction (45% vs 25%) & less area overhead • Mainly due to Vdd-level converter removal • High Vdd interconnects with power gating is BEST considering area

  29. Outline • Review and motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • Conclusions and Ongoing Work

  30. Conclusions and Ongoing Work • Conclusions • Developed power-gateable interconnects w/ virtually no extra SRAM cell • Achieved 38.18% total power reduction using Vdd-gateable interconnects • Achieved 24.78% interconnect dynamic power reduction, 45.00% total power reduction with duplicated (2W) channel width • Ongoing work • Power-ground design to support dual-Vdd • Optimal mix of Vdd-programmable and Vdd-gateable interconnects • Architecture evaluation considering Vdd programmability [Lin et al, to appear in FPGA’05]

More Related