Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada

Contributions • Two new FPGA benchmark circuit “suites” • Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs • Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand • Two new FPGA CAD flows • DHPack: Design Hierarchy Packing • Identify congested IP blocks  depopulate  reduced interconnect demand • Conference paper: “Logic Block Clustering…”, published at DAC 2005 • Un/DoPack: UnPack and DoPack • Find “local” interconnect congestion  depopulate  reduced interconnect demand • Conference paper, submitted to DAC 2006 • Discoveries… • “Non-uniform” depopulation limits area inflation • “BLE limiting” gives better interconnect controllability than “Input limiting” • “Interconnect variation” important for area inflation and FPGA architecture design • “Routing closure” achieved by re-clustering and incremental place & route • UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!

L L L L L L L L L L L L L L L L L L L L L L L L L Mesh-Based FPGA Architecture • 16 logic blocks • 4 wires per channel • 4*4=16 total horizontal tracks • 9 logic blocks • 4 wires per channel • 3*4=12 total horizontal tracks • Larger FPGAs have more “aggregate” interconnect

L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L Logic Utilization vs. Channel Width • Trade-off logic utilization for channel width • User can always buy more logic…. (not more wires) Trade-off: CLB count for Channel width FPGA 1 FPGA 2 But….. can we achieve lower Total Area? ( = SIZE * CLB Count) ( No! but we can break even! )

L L L L L L L L L L L L L L L L Logic Element: BLE and CLB BLE #1 • Basic Logic Element (BLE) • ‘k’-input LUT + FF • Configurable Logic Block (CLB) • ‘N’ BLEs, ‘N’ outputs • ‘I’ shared inputs BLE #2 BLE #3 ‘N’ Outputs ‘I’ Inputs BLE #4 Note: I < k*N BLE #5 CLB

CLB Depopulation BLE #1 • General Approach • Use existing clustering tools • Do not fill CLB while clustering • Input-Limited • Eg. Maximum 67% inpututilization per CLB • Might use allBLEs • BLE-Limited • Eg. Maximum 60% BLE utilization per CLB • Might use allInputs BLE #2 BLE #3 ‘N’ Outputs ‘I’ Inputs BLE #4 BLE #5 CLB

Reducing Channel Width Results(max cluster size 16, max num inputs 51) • Input-Limited • No channel width control • BLE-Limited • (almost) monotonically increasing  good channel width control

Meta Benchmark Circuit Creation • Mimic process of creating large designs • “IP Blocks” <==> MCNC Circuits • SoC <==> Randomly integrate/stitch together “IP Blocks” • IP Blocks have varied interconnect needs • Considered 3 stitching schemes… • Independent • IP Blocks are not connected to each other • Pipeline • Outputs of one IP block connected to inputs of next IP block • Clique • Outputs of each IP block are uniformly distributed to inputs of all other IP blocks

DHPack: Meta Circuit P&R • Use VPR FPGA tools from University of Toronto • Observation 1 • VPR placer successfully groups IP blocks from random initial placement • Observation 2 • VPR router confirms channel width of MetaCircuit is dominated by a few IP blocks{ pdc, clma, ex1010 }

DHPack: Meta Circuit P&R Results Constraint Routed Channel Width Normalized Area • Clique MetaCircuit • P&R channel width results closely match “constraints” 1 Channel Width Constraint Channel Width Constraint • Shrink Channel Width by ~20% (from 95 to 75), NO AREA INCREASE by ~50% (from 95 to 50), 1.7x area increase

Meta Circuits vs. Stdev Circuits • Meta Circuit Drawbacks • Design hierarchy boundaries not well-defined • Coarse-grained IP block boundary • Stitching unrealistic • Flip Flop placed at every output • Connections only have FO1 • Stdev Circuits (created using GNL) • Synthetic clone of Meta circuits • Hierarchical  specify Rent parameter of each partition • Root  # I/Os, # IP blocks • Second Level  20 IP blocks, # LEs, Rent parameter

Stdev Circuits: Rent Parameters • 7 benchmark circuits • 240/120 primary inputs/outputs, approx 52,000 CLBs • Rent parameter: Average 0.62, vary Stdev 0.0 to 0.12

Un/DoPack Flow • Iterative non-uniform cluster depopulation tool • Step 1: Traditional SIS/VPR • Step 2: UnPack: • Congestion Calculator • Step 3: DoPack: • Incremental Re-Cluster • Step 4,5: Fast Place/Route

Un/DoPack Flow: SIS/VPR • Step 1: Traditional SIS/VPR

Un/DoPack Flow: UnPack • Step 2: UnPack • Generate Congestion Map • CLB Label = Largest CW occ in 4 adjacent channels

Un/DoPack Flow: UnPack • Step 2: UnPack: Depop Center = Largest CLB label M X M Array

Un/DoPack Flow: UnPack • Step 2: UnPack: Depop Radius = M/4 Depop Amt: 1 new row/col in array M X M Array

Un/DoPack Flow: DoPack • Step 3: DoPack: • Incremental Re-Cluster

Un/DoPack Flow: Fast P&R • Step 4,5: Fast Place/Route • Fast Placement • UBC Incremental Placer(under development) • VPR “–fast” option • Router • Use full routed solution • Slow but reliable

Before 120/79/27 Peak / Avg / Stddev After 100/79/20 Peak / Avg / Stddev

Normalized Area of GNL Benchmarks

Absolute Area of GNL Benchmarks

Interconnect Variation: Impact on FPGA Architecture Design High VariationCircuits RequireWide Channel Width

Contributions • Two new FPGA benchmark circuit “suites” • Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs • Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand • Two new FPGA CAD flows • DHPack: Design Hierarchy Packing • Identify congested IP blocks  depopulate  reduced interconnect demand • Conference paper: “Logic Block Clustering…”, published at DAC 2005 • Un/DoPack: UnPack and DoPack • Find “local” interconnect congestion  depopulate  reduced interconnect demand • Conference paper, submitted to DAC 2006 • Discoveries… • “Non-uniform” depopulation limits area inflation • “BLE limiting” gives better interconnect controllability than “Input limiting” • “Interconnect variation” important for area inflation and FPGA architecture design • “Routing closure” achieved by re-clustering and incremental place & route • UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!

End of Talk

Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

Presentation Transcript

Department of Electrical and Computer Engineering

Tom Wilson University of Tennessee Department of Electrical and Computer Engineering

Javad Lavaei Department of Electrical Engineering Columbia University

Javad Lavaei Department of Electrical Engineering Columbia University

Javad Lavaei Department of Electrical Engineering Columbia University

Javad Lavaei Department of Electrical Engineering Columbia University

University of Tehran Department of Electrical and Computer Engineering

Cristina Conati Department of Computer Science University of British Columbia

Department of Electrical and Computer Engineering

Javad Lavaei Department of Electrical Engineering Columbia University

Department of Electrical and Computer Engineering

Department of Electrical and Computer Engineering

Cristina Conati Department of Computer Science University of British Columbia

Department of Electrical and Computer Engineering

Department of electrical and computer engineering

Department of Electrical and Computer Engineering

Sohrab Shah Department of Computer Science University of British Columbia

Polytechnic University Department of Electrical and Computer Engineering

Department of Computer and Electrical Engineering

Cristina Conati Department of Computer Science University of British Columbia

Javad Lavaei Department of Electrical Engineering Columbia University

Department of Computer and Electrical Engineering