300 likes | 473 Views
Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction. Yan Lin, Fei Li and Lei He EE Department, UCLA Partially supported by NSF grant CCR-0306682. Address comments to lhe@ee.ucla.edu. Outline. Review and Motivation
E N D
Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction Yan Lin, Fei Li and Lei He EE Department, UCLA Partially supported by NSF grant CCR-0306682. Address comments to lhe@ee.ucla.edu.
Outline • Review and Motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • Conclusions and Ongoing Work
Power Limitation of FPGAs • Existing FPGAs are HIGHLY power inefficient (> 100X more than ASIC) • E.g. [Kusse, ISLPED’98] • Power is likely the largest limitation for FPGAs
FPGA Power Reduction • Power aware FPGA CAD algorithms for existing FPGA architectures • CAD algorithms to minimize power-delay product [Lamoureux et al, ICCAD’03] • Configuration inversion for leakage reduction [Anderson et al, FPGA’04] • Power efficient FPGA circuits and architectures • Dual-Vdd and Vdd-programmable FPGA logic blocks [Li et al, FPGA’04][Li et al, DAC’04] • Vdd-programmable FPGA interconnects • [Li et al, ICCAD’04] • [Anderson et al, ICCAD’04]
Overall FPGA Structure • Cluster-based Island Style FPGA Structure • Logic blocks are embedded into routing resources • Wire segment connectivity is programmable
FPGA Routing Structure • Subset Programmable switch block • An incoming track can be connected to different outgoing tracks with the same track number • Programmable connection block
Vdd-programmable Interconnects [Li et al, ICCAD’04] • Conventional routing switch • Vdd-programmable switch • Vdd selection for used switch • Power-gating unused switch • Configurable Vdd-level conversion • Avoid excessive leakage when low Vdd switch drives high Vdd switches Power transistor
Limitation of Vdd-programmable Interconnects [Li et al, ICCAD’04] • Fine-grained Vdd-level converter insertion • Area overhead • 54% area overhead for circuit s38584 • Leakage overhead • 36% leakage overhead for circuit s38584 • SRAM cell overhead • 300% SRAM cell overhead for each switch • Area/SRAM efficient low-power interconnects are needed
Outline • Review and Motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • Conclusions and Ongoing Work
Low Utilization Rate of Interconnects • 78.15% of total power is consumed by global interconnect power [Li et al, DAC’04] • 47% of global interconnect power is leakage • Why? • Extremely low utilization rate (~12% w/ minimum array)
Interconnect Utilization Rate is Intrinsically Low • Programmable switch block • no more than 25% • Programmable connection block • Only one is used (for 64 tracks) • Power-gating unused interconnects is necessary
Vdd-gateable Routing Switch • Conventional routing switch • Vdd-gateable routing switch • Only two states for a routing switch • High Vdd • Power-gating • Enable power-gating capability w/o extra SRAM cells Power transitor
Vdd-Gateable Connection Block • Conventional connection block • Vdd-gateable connection block • Enable power-gating capabilityw/ only one extra SRAM for a connection block • Only n+1 SRAM cells for 2n connection switches • A low leakage decoder is needed
Power and Delay of Vdd-gateable Switch • Vdd-gateable switch compared to conventional switch • Dynamic power is almost the same • >300X leakage power reduction • ~6% delay increase
Power Reduction by Power-gating Unused Interconnects Vdd-programmable interconnects Vdd-gateable interconnects
Outline • Review and motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • FPGA fabrics and algorithms • Design flow and quantitative evaluation • Conclusions and Ongoing Work
Pre-Defined Dual-Vdd Routing Architecture • Reduce dynamic power with dual-Vdd by making use of timing slack • Partition routing channel into VddH and VddL regions • Vdd-gateable interconnect switch is used • Ratio of VddH/VddL track is an architectural parameter
Ratio of VddH to VddL Track • Determine ratio using dual-Vdd assignment profile without considering layout constraint • Sensitivity-based dual-Vdd assignment • Assignment unit --- a routing tree • Power sensitivity --- ΔP/ ΔVdd • Power difference for a routing tree between VddH and VddL • Greedy algorithm --- sensitivity based • Initial: uniform VddH assignment • Procedure: assign VddL to routing tree with largest power sensitivity (but without increasing critical delay)
Profile of Dual-Vdd Assignment • Assignment with no critical path delay increase (VddH:VddL=1.5v:1.0v) • Set the ratio of VddH/VddL track to 1:1
Level Converter is NOT Needed B A • Wire segment can only be connected to another wire segment with the same track number via a subset switch block
Level Converter is NOT Needed B A • Wire segment can only be connected to another wire segment with the same track number via a subset switch block • No level converter is needed in switch block
Layout Constraint Due to Dual-Vdd • Dual-Vdd introduces performance degradation due to layout constraint • Insufficient routing resources for Vdd-matched routing trees • May introduce detours • Solutions • Vdd-programmable interconnects [Li et al, ICCAD’04] • Provide sufficient routing tracks for Vdd-matched routing trees • Control leakage by power-gating unused interconnects
Arch Spec Double Channel width Delay/Power Model (dual-Vdd) Design Flow for Dual-Vdd Interconnects Tech Mapped Netlist (Single-Vdd) Timing Driven Layout (Single-Vdd) Dual-Vdd Assignment for Routing Trees Timing Driven Layout (Dual-Vdd) Power-gating Unused Switches Delay/Power Estimation Delay Power
Dual-Vdd Routing Algorithm • Based on the maze routing algorithm in VPR • Modify the cost function • TotalCost(n): the cost of routing tree T through wire segment n to the target sink j • PathCostDv(n): the cost of the path from the current partial routing tree to wire segment n • ExpectedDv(n,j): the estimated cost from wire segment n to the target sink j • Matched(T,n): boolean function describing Vdd-matching status
Outline • Review and motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • FPGA fabrics and algorithms • Quantitative evaluation • Conclusions and Ongoing Work
arch-SV 1.5v arch-PV arch-PV+PG 1.3v arch-DV+PG(1.5W) 1.3v/1.0v 1.5v/0.8v 1.0v 0.9v 1.5v/0.8v 1.0v/0.8v 1.3v/1.0v 0.9v/0.8v 1.0v/0.8v 0.9v/0.8v 1.3v/0.9v 1.5v/0.8v 0.9v/0.8v 1.0v/0.8v Comparison of Low Power Architectures 0.27 0.22 power (watt) 0.17 0.12 Circuit: S38584 0.07 60 70 80 90 100 110 120 130 clock frequency (MHZ) • Dual-Vdd interconnects with fine-grained power gating • May have performance degradation due to layout constraint • Can reduce more power than purely power-gating unused switches • Achieve 9.78% interconnect dynamic power reduction, 38.68% total power saving with 1.5W channel width • W is the nominal routing channel width in single-Vdd FPGA
50% 1 power saving 0.955 0.95 normalized clock frequency clock frequency 0.9 45% 0.838 0.85 power saving 45.00% 0.8 0.743 normalized clock frequency 40% 0.75 power saving 0.7 38.68% 0.65 35% 34.86% 0.6 0.55 30% 0.5 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 channel width Impact of Routing Channel Width • We get the power reduction percentage at the maximum clock frequency achieved by dual-Vdd interconnects • Channel width increases from 1.0W to 2.0W • Power saving increases from 34.86% to 45% • Normalized clock frequency increases from 0.743 to 0.955
Area Overhead of Vdd-gateable Interconnects • Device area is dominant • Area overhead is mainly due to power transistors for power-gating capability • Track duplication with power-gating vs Vdd-programmable interconnects [Li et at, ICCAD’04] • More power reduction (45% vs 25%) & less area overhead • Mainly due to Vdd-level converter removal • High Vdd interconnects with power gating is BEST considering area
Outline • Review and motivation • Interconnect Leakage Power Reduction using Power-gating • Interconnect Dynamic Power Reduction using Dual-Vdd • Conclusions and Ongoing Work
Conclusions and Ongoing Work • Conclusions • Developed power-gateable interconnects w/ virtually no extra SRAM cell • Achieved 38.18% total power reduction using Vdd-gateable interconnects • Achieved 24.78% interconnect dynamic power reduction, 45.00% total power reduction with duplicated (2W) channel width • Ongoing work • Power-ground design to support dual-Vdd • Optimal mix of Vdd-programmable and Vdd-gateable interconnects • Architecture evaluation considering Vdd programmability [Lin et al, to appear in FPGA’05]