280 likes | 401 Views
ACME LAB. Exploration of Pipelined FPGA Interconnect Structures. Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of Wisconsin - Madison. T 1. . . S. T 2. PipeRoute. FPGA’2003: Pipelining-aware Router for FPGAs
E N D
ACME LAB Exploration of Pipelined FPGA Interconnect Structures Scott HauckAkshay Sharma, Carl Ebeling University of Washington Katherine Compton University of Wisconsin - Madison
T1 S T2 PipeRoute • FPGA’2003: Pipelining-aware Router for FPGAs • Architecture-adaptive, based on Pathfinder • Uses optimal 2-terminal, 1-delay router • Greedy formulation for multi-delay, multi-terminal routing
GPR RAM GPR RAM MULT GPR ALU GPR ALU GPR RAM GPR ALU RaPiD • Coarse-grained, 1D, 16-bit, w/DSP Units • Carl Ebeling @ UW-CSE • Pipelined interconnect via Bus Connectors (BCs)
T T S S Pipelined Routing Results • Area expansion due to pipelining • Normalized to unpipelined circuit area Ave: 75% cost
S T Contributions • Optimized PipeRoute • Support multiple delays per BC (greedy preprocessor) • Timing driven – Pathfinder’s, worst-case criticality across signal • RouteCost = Criticality * delay_cost + (1-criticality) * area_cost • Arch. Exploration of RaPiD Pipelined Interconnects • Registered logic block (input/output/none) • BC track length • Delays per register/BC • BC/non-BC routing mix • Register-only logic blocks • Goal: More efficient support of pipelined interconnects
Benchmarks Retimed, not C-slowed Graphs Increase arch to fit (cells, tracks/cell) Variation around local minima Methodology
+ + T1 + S T2 Registers in Logic Blocks • Output Registers • No Registers • Input Registers 5%20%23%
1 Delay/BC 2 Delays/BC Delays per Register/BC 15%20%30%
BC Track Length • Length 16 BC wires • Length 8 BC wires 17%64%69%
Routing Resource Mix (BC vs. non-BC) • 5/7 • 7/7 19%17%18%
GPRs per Cell • GPR roles: • Registers from computation • Passthrough for changing tracks • 6 per cell • 9 per cell 6%23%22%
RaPiD-I 1 BC / cell (13 LBs long) 5/7 BC tracks 3 registers / BC 6 GPRs / cell registered outputs Post-Explore 1 BC / cell (16 LBs long) 5/7 BC tracks 3 registers / BC 9 GPRs / cell registered inputs Overall – vs. RaPiD-I Ave: 1%18%19%
T T S S Overall – Pipelining Cost Ave: 18% cost
Conclusions • Router for arbitrary pipelined architectures • Timing-driven • Supports multiple delays at each register site • Good quality: <18% of pseudo-lower bound (non-pipelined) area • Architecture Exploration of RaPiD • Parameters: • Registered inputs on functional units • Length 16 wires • 3 delays per BC/register • 2/7 non-registered, 5/7 registered wires • 9 GPRs/cell to improve flexibility • Delay: spacing of registers CRITICAL, too close better than too far • 19% area*delay improvement over RaPiD-I (primarily delay)
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T