560 likes | 748 Views
CS137: Electronic Design Automation. Day 2: January 6, 2006 Spatial Routing. Today. Idea Challenges Path Selection Victimization Allocation Methodology Quality, Timing Parallelism Mesh FPGA Implementation. CS137a: Day22. Global/Detail. With limited switching ( e.g. FPGA)
E N D
CS137:Electronic Design Automation Day 2: January 6, 2006 Spatial Routing
Today • Idea • Challenges • Path Selection • Victimization • Allocation • Methodology • Quality, Timing • Parallelism • Mesh • FPGA Implementation
CS137a: Day22 Global/Detail • With limited switching (e.g. FPGA) • can represent routing graph exactly
Pathfinder Review • Key step: find-shortest path from src to sink • Mark links by usage • Used links cost most • Shortest path tries to avoid • Negotiated Congestion w/ History • Increase cost of congested nodes • Adaptive cost … makes historically congest nodes expensive, try to avoid
Slow? • Why is routing slow? • Each route: • search all possible paths from source to sink • Number of paths expands as distance2 • Graph of network is MBs large • Large complicated data structure to walk • Won’t all fit in cache • Number of nets = Number of edges • Perform many iterations to converge
Parallelism? • Search all paths in parallel for a single route • Search routes for multiple nets in parallel • Don’t overlap • Overlap?
Initial Key Ideas • Augment existing static network structure to route itself • Use hardware to exploit parallelism in routing • Search all paths in parallel • Route multiple nets in parallel • Avoid walking irregular graph • Specialized/pipelined hardware at each switch • Hardware can perform a route trial in 10s of cycles vs. 10K-100K cycles for software
2 4 Hardware Route Search in Action
Idea Existing paths already allocated Drive a one into search paths All free paths pass up Path Search Hardware
Challenges • How select among paths? • What if there are no free paths? • Can we work without Pathfinder’s history? • How handle fanout? • How handle allocation and victimization?
Select Among Paths? • Easy: Randomly • Use PRNG at xover switchbox • Otherwise, need to represent costs…
No Paths? • Try stealing a path (rip-up) victimize existing path • Which one? • Randomly select victim • History-free Pathfinder suggest: • one with least nets shared with other routes CountCost • CountNet: one which intersects least existing nets
CountNet vs. CountCost • CountCost: 6 • CountNetCost: 1
Implement Counting? Idea: Delay congested signal Free paths not delayed. Least congested signal arrives at xover first.
CountNet Approximation • Keeping track of which net uses a switch would be much more state/complicated • Approximate CountNet by only delaying at conflicting switches
Implement CounNet Approximation Allow to pass if agrees with switch setting.
Cost is max of sides • Also note: • Actual cost is max(srcxover,sinkxover) instead of sum
Algorithm Comparison – Random Netlist Total Channels HSRA Array Size
How Improve? • Apologize for lack of history? • Exploit fast • Try multiple starts and exploit randomness • Like multiple starts of FM
CountNet CountNet best of 20 starts.
Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths.
Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths.
Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths.
Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths. • Add a state bit at every switch • Set when allocate during the current net search. • Clear when we begin to route a new net • Order the destinations associated with a single source • For each destination, • Search from sink as before (only from sink) • At the switch, if the state bit is set and the sink side is congestion free, we have found an available path. • Otherwise, drive ones into all available source paths and allocate a new path, like a standard route search.
Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths.
Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths. • Add a state bit at every switch • Set when allocate during the current net search. • Clear when we begin to route a new net • Order the destinations associated with a single source • For each destination, • Search from sink as before (only from sink) • At the switch, if the state bit is set and the sink side is congestion free, we have found an available path. • Otherwise, drive ones into all available source paths and allocate a new path, like a standard route search.
High Fanout Nets • Victimizing high fanout net will cause considerable re-route work • Might want to penalize victimizing high fanout nets • CountNetFanout? • Requires more state…expensive… • Simple hack: lock high fanout nets against victimization • What’s a high fanout net? >10?
So far • All Quality • …haven’t dealt with all performance details • Had basis for confidence in performance • Wanted to make sure worthwhile first
Hardware Allocation Add all nets to R While nets in R > 0 and routeTrial < RTmax For each unrouted net Find all possible routes If found possible routes Randomly select and allocate a route Else Select a route to victimize and allocate the route Endfor Adjust R Endwhile Idea: send one down selected path
With Victimization Add all nets to R While nets in R > 0 and routeTrial < RTmax For each unrouted net Find all possible routes If found possible routes Randomly select and allocate a route Else Randomly select a route to victimize and allocate the route Endfor Adjust R Endwhile
Analysis Methodology • Sequential version that does effectively the same thing (perhaps inefficiently) • Count key operations/variables • Number of net searches • Number of victims • Timing model for key operations • Calculate Performance under various timing assumptions
Timing Models • Hardware Timing • Tpath = length of path ~= log(N) • Tallocate~=Tpath • Tvictim~=4*Tpath • Software Timing • Tallocate~=Npathsw*(Tm+Tc+Twb+Ta) • Tvictim~=Npathsw*(Tm+Tc)+V*Talloc • Tm=main memory ref • Tc=cache ref; Twb=write buffer; Ta=bit alloc
Route Time Ntry – number of route starts NRT – number of path searches NRO – number of rip ups NFO – number of fanout searches NFOA – number of fanout allocations
There is a quality/time tradeoffs Want to compare at iso-quality Making comparisons
More Parallelism • Only exploiting parallelism in path search • Subtrees are independent • Route root • Then route next two channels in parallel • Then route next 4…
Still Not Exploiting • Multiple path searches in parallel that overlap routing resources…
Extension to Mesh Networks • No well defined crossover point . • Path back to the source is not implied directly by the topology of the routing network. • Paths of different length • and non-minimal length paths may be important components of a good solution.
Mesh Approach • Single-ended search from source • Larger delay on congestion allow non-minimal length paths • Breadcrumb approach leave state in switches pointing back to source
Extension to Mesh Networks - Results (Simulator too slow to run larger)
BFT FPGA Implementation • 21 4-LUTs to implement switch logic +9 4-LUTs to manage prng/allocation =30 4-LUTs/T-switch • 13/3 switches/PE/domain • 130 4-LUTs/PE/domain • C=10 • 1300 4-LUTs / PE