400 likes | 585 Views
ACME LAB. PipeRoute: A Pipelining-Aware Router for FPGAs Akshay Sharma, Carl Ebeling* and Scott Hauck Electrical Engineering / *Computer Science & Engineering University of Washington Seattle, WA – 98195. Pipelined FPGA Architectures. FPGAs and flexible computing But, max clock frequency?
E N D
ACME LAB PipeRoute: A Pipelining-Aware Router for FPGAsAkshay Sharma, Carl Ebeling* and Scott HauckElectrical Engineering / *Computer Science & EngineeringUniversity of WashingtonSeattle, WA – 98195
Pipelined FPGA Architectures • FPGAs and flexible computing • But, max clock frequency? • Examples of pipelined FPGAs • RaPiD (Ebeling et al, 1996) • HSRA (Tsu et al, 1999) • UCSB (Singh et al, 2001) • Few prominent features • A fraction of (or all) switch-points are registered • Registered LUT inputs • Netlists heavily pipelined and retimed
Pipelined Routing • PipeRoute – route netlists on pipelined FPGAs • pipelined netlist provides information about register separation • FPGA routing graph consists of R-nodes and D-nodes • Cost of using an R-node or D-node in a route is the same as Pathfinder • Pipelined routing problem differs from normal FPGA routing T1 S T2
Normal Routing – Two Terminal • Dijkstra’s shortest-path for two-terminal routing T S
Normal Routing – Two Terminal • Dijkstra’s shortest-path for two-terminal routing T S
Normal Routing – Two Terminal • Dijkstra’s shortest-path for two-terminal routing T S
Normal Routing – Two Terminal • Dijkstra’s shortest-path for two-terminal routing T S
Normal Routing – Two Terminal • Dijkstra’s shortest-path for two-terminal routing T S
T S Pipeline Routing – Two Terminal • Find shortest route that goes through N registers (hereafter “registers” will be called “delays”) • Traveling Salesman • Find shortest route that goes through all nodes in a graph • NP Complete
Two Terminal 1-Delay Router • Can do optimal routing for 1-delay routes via Dijkstra S T
Two Terminal 1-Delay Router • Can do optimal routing for 1-delay routes via Dijkstra S T
Two Terminal 1-Delay Router • Can do optimal routing for 1-delay routes via Dijkstra S T
Two Terminal 1-Delay Router • Can do optimal routing for 1-delay routes via Dijkstra S T
Two Terminal 1-Delay Router • Can do optimal routing for 1-delay routes via Dijkstra S T
Two Terminal 1-Delay Router • Can do optimal routing for 1-delay routes via Dijkstra S T
Two Terminal 1-Delay Router • Can do optimal routing for 1-delay routes via Dijkstra S T
Two Terminal N-Delay Router • Greedy Approximation via 1-Delay Router S T
Two Terminal N-Delay Router • Greedy Approximation via 1-Delay Router • Find 1-delay route S T
Two Terminal N-Delay Router • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
Two Terminal N-Delay Router • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
Two Terminal N-Delay Router • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
Two Terminal N-Delay Router • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
Normal Routing – Multi-Terminal • Do two-terminal routing • Use all of previous route(s) as source for next route T1 S T2
T1 S Normal Routing – Multi-Terminal • Do two-terminal routing • Use all of previous route(s) as source for next route T2
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S T1 S T2
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S T1 S T2
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S • Accumulate 1 delay at a time T1 S T2
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S • Accumulate 1 delay at a time • When routing for an I delay, start from all existing routing at delay I and I-1 T1 S T2
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S • Accumulate 1 delay at a time • When routing for an I delay, start from all existing routing at delay I and I-1 T1 S T2 1
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S • Accumulate 1 delay at a time • When routing for an I delay, start from all existing routing at delay I and I-1 T1 S T2
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S • Accumulate 1 delay at a time • When routing for an I delay, start from all existing routing at delay I and I-1 T1 S T2 2
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S • Accumulate 1 delay at a time • When routing for an I delay, start from all existing routing at delay I and I-1 T1 S T2
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S • Accumulate 1 delay at a time • When routing for an I delay, start from all existing routing at delay I and I-1 T1 S T2 3
Multi-Terminal Router • Sinks considered in increasing order of delay separation • T1 is 2 delays away from S, and T2 is 3 delaysaway from S • Accumulate 1 delay at a time • When routing for an I delay, start from all existing routing at delay I and I-1 T1 S T2
Benchmark Architecture • Modified RaPiD architecture • 1-D datapath of 16-bit ALUs, Multipliers, registers and memories • Pipelined interconnect structure • Long and short tracks • Bus Connectors used to pick up delay
Testing • Benchmark RaPiD netlists • Pipelining aware placement tool • For each netlist • Treat netlist as unpipelined and determine smallest RaPiD arch. (Zl) • Determine smallest RaPiD arch. needed to route pipelined netlist (Zp) • Pipelining cost = Zp/Zl
Results • Avg pipelining cost incurred = 1.74
Results • Effect of netlist-size on pipelining cost • Normalized to unpipelined netlist area
Results • Effect of % pipelined signals on pipelining cost • Normalized to unpipelined circuit area
The Future • Delay driven PipeRoute • Currently under development • Sophisticated pipelining-aware placement algorithms • Fast pipelined routing algorithms • Use PipeRoute to explore pipelined FPGA architectures • Number and location of registered switch-points