270 likes | 402 Views
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays. Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada. Contributions. Two new FPGA benchmark circuit “suites”
E N D
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada
Contributions • Two new FPGA benchmark circuit “suites” • Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs • Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand • Two new FPGA CAD flows • DHPack: Design Hierarchy Packing • Identify congested IP blocks depopulate reduced interconnect demand • Conference paper: “Logic Block Clustering…”, published at DAC 2005 • Un/DoPack: UnPack and DoPack • Find “local” interconnect congestion depopulate reduced interconnect demand • Conference paper, submitted to DAC 2006 • Discoveries… • “Non-uniform” depopulation limits area inflation • “BLE limiting” gives better interconnect controllability than “Input limiting” • “Interconnect variation” important for area inflation and FPGA architecture design • “Routing closure” achieved by re-clustering and incremental place & route • UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!
L L L L L L L L L L L L L L L L L L L L L L L L L Mesh-Based FPGA Architecture • 16 logic blocks • 4 wires per channel • 4*4=16 total horizontal tracks • 9 logic blocks • 4 wires per channel • 3*4=12 total horizontal tracks • Larger FPGAs have more “aggregate” interconnect
L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L Logic Utilization vs. Channel Width • Trade-off logic utilization for channel width • User can always buy more logic…. (not more wires) Trade-off: CLB count for Channel width FPGA 1 FPGA 2 But….. can we achieve lower Total Area? ( = SIZE * CLB Count) ( No! but we can break even! )
L L L L L L L L L L L L L L L L Logic Element: BLE and CLB BLE #1 • Basic Logic Element (BLE) • ‘k’-input LUT + FF • Configurable Logic Block (CLB) • ‘N’ BLEs, ‘N’ outputs • ‘I’ shared inputs BLE #2 BLE #3 ‘N’ Outputs ‘I’ Inputs BLE #4 Note: I < k*N BLE #5 CLB
CLB Depopulation BLE #1 • General Approach • Use existing clustering tools • Do not fill CLB while clustering • Input-Limited • Eg. Maximum 67% inpututilization per CLB • Might use allBLEs • BLE-Limited • Eg. Maximum 60% BLE utilization per CLB • Might use allInputs BLE #2 BLE #3 ‘N’ Outputs ‘I’ Inputs BLE #4 BLE #5 CLB
Reducing Channel Width Results(max cluster size 16, max num inputs 51) • Input-Limited • No channel width control • BLE-Limited • (almost) monotonically increasing good channel width control
Meta Benchmark Circuit Creation • Mimic process of creating large designs • “IP Blocks” <==> MCNC Circuits • SoC <==> Randomly integrate/stitch together “IP Blocks” • IP Blocks have varied interconnect needs • Considered 3 stitching schemes… • Independent • IP Blocks are not connected to each other • Pipeline • Outputs of one IP block connected to inputs of next IP block • Clique • Outputs of each IP block are uniformly distributed to inputs of all other IP blocks
DHPack: Meta Circuit P&R • Use VPR FPGA tools from University of Toronto • Observation 1 • VPR placer successfully groups IP blocks from random initial placement • Observation 2 • VPR router confirms channel width of MetaCircuit is dominated by a few IP blocks{ pdc, clma, ex1010 }
DHPack: Meta Circuit P&R Results Constraint Routed Channel Width Normalized Area • Clique MetaCircuit • P&R channel width results closely match “constraints” 1 Channel Width Constraint Channel Width Constraint • Shrink Channel Width by ~20% (from 95 to 75), NO AREA INCREASE by ~50% (from 95 to 50), 1.7x area increase
Meta Circuits vs. Stdev Circuits • Meta Circuit Drawbacks • Design hierarchy boundaries not well-defined • Coarse-grained IP block boundary • Stitching unrealistic • Flip Flop placed at every output • Connections only have FO1 • Stdev Circuits (created using GNL) • Synthetic clone of Meta circuits • Hierarchical specify Rent parameter of each partition • Root # I/Os, # IP blocks • Second Level 20 IP blocks, # LEs, Rent parameter
Stdev Circuits: Rent Parameters • 7 benchmark circuits • 240/120 primary inputs/outputs, approx 52,000 CLBs • Rent parameter: Average 0.62, vary Stdev 0.0 to 0.12
Un/DoPack Flow • Iterative non-uniform cluster depopulation tool • Step 1: Traditional SIS/VPR • Step 2: UnPack: • Congestion Calculator • Step 3: DoPack: • Incremental Re-Cluster • Step 4,5: Fast Place/Route
Un/DoPack Flow: SIS/VPR • Step 1: Traditional SIS/VPR
Un/DoPack Flow: SIS/VPR • Step 1: Traditional SIS/VPR
Un/DoPack Flow: SIS/VPR • Step 1: Traditional SIS/VPR
Un/DoPack Flow: UnPack • Step 2: UnPack • Generate Congestion Map • CLB Label = Largest CW occ in 4 adjacent channels
Un/DoPack Flow: UnPack • Step 2: UnPack: Depop Center = Largest CLB label M X M Array
Un/DoPack Flow: UnPack • Step 2: UnPack: Depop Radius = M/4 Depop Amt: 1 new row/col in array M X M Array
Un/DoPack Flow: DoPack • Step 3: DoPack: • Incremental Re-Cluster
Un/DoPack Flow: Fast P&R • Step 4,5: Fast Place/Route • Fast Placement • UBC Incremental Placer(under development) • VPR “–fast” option • Router • Use full routed solution • Slow but reliable
Before 120/79/27 Peak / Avg / Stddev After 100/79/20 Peak / Avg / Stddev
Interconnect Variation: Impact on FPGA Architecture Design High VariationCircuits RequireWide Channel Width
Contributions • Two new FPGA benchmark circuit “suites” • Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs • Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand • Two new FPGA CAD flows • DHPack: Design Hierarchy Packing • Identify congested IP blocks depopulate reduced interconnect demand • Conference paper: “Logic Block Clustering…”, published at DAC 2005 • Un/DoPack: UnPack and DoPack • Find “local” interconnect congestion depopulate reduced interconnect demand • Conference paper, submitted to DAC 2006 • Discoveries… • “Non-uniform” depopulation limits area inflation • “BLE limiting” gives better interconnect controllability than “Input limiting” • “Interconnect variation” important for area inflation and FPGA architecture design • “Routing closure” achieved by re-clustering and incremental place & route • UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!