230 likes | 373 Views
Enforcing Long-Path Timing Closure for FPGA Routing with Path Searches on Clamped Lexicographic Spirals. Keith So University of New South Wales, Sydney, Australia Feb 25 @ FPGA’08. Outline. Problem Statement Related Work SpiralRoute Overview Budget Generation
E N D
Enforcing Long-Path Timing Closure for FPGA Routingwith Path Searches on Clamped Lexicographic Spirals Keith So University of New South Wales, Sydney, Australia Feb 25 @ FPGA’08
Outline • Problem Statement • Related Work • SpiralRoute Overview • Budget Generation • Clamped Lexicographic Search • Some Performance Optimizations • Experiments • Conclusions and Future Work
Problem Statement & Assumptions Long-Path Timing-Driven Detailed Routing • Given: Placed circuit mapped into RR Graph + Timing Requirement D • Find: Mutually RR-vertex disjoint routing trees s.t. Max. Long-Path Comb. Delay <= D Assumptions • D is achievable under given placement • Buffered switching (delays summable)
Related Work • [F’92] Iterative slack allocation • [AR’95] Criticality bin + Steiner/Arbor. • [ME’95] Negotiated Congestion • [BR’97] VPR • [LW’03] Lagrangian Rel. Weighting • [ANC+’04] Auto. Constraint Gen. • [FBC’04] RCV
SpiralRoute Overview • Negotiated Congestion Routing over A* • Paths are lexicographic-costed [S’07,ISPD07] Major Deltas • Optimal delay upper bound generation for FPGA routing domain • Minimum-congestion bounded-delay searching (vs tradeoff using weights) • Provable timing closure at completion
Connection Budget Generation – Optimization Component Weighted Budget Distribution Problem [Ghiasi et.al, ICCAD’04] Given: DAG G=(V,A), min. delays dij, weights wij, long-path constraint T Find: delay budgets bij such that: • (dij+bij) summed along all paths satisfies T • Sum of (wij.bij) over all edges is maximised Transforms into min-cost flow problem; budgets recovered from dual of flow solution.
Connection Budget Generation – Mapping to FPGA Routing • Represent LE’s and pads as edges (split clocked LE’s) • Form super-DAG • dij = min connection delay (from congestion-oblivious routing) • Set T = D • Set wij = 1 for real edges, 0 for virtuals • Solved (dij+bij) is the maximum delay for each edge in our routing
Search Design – n-Lex. Search • [1-Line A* search: f(v)=g(v)+h(v), expand v with minimum f(v) until t] • 2-component lexicographic search used for routability router (Conceptually a*∞ + b) • Need n-components and custom comparison functions (proofs needed to avoid ∞k values!) Theorem A* of n-lexicographic search is admissible if all components are totally-ordered monoids with order-preserving addition • Monoids helpful to avoid clutter from max()
Search Design – Clamping Component • 3-component vector • Delay, with pivot (x < y iff x <= T & y > T) • Congestion, regular < • Delay, regular < Ex: f(w2)=[0,2,2]; f(x1)=[1,0,4]; f(w3)=[0,1,3] Assumption h(v) is at least close to h*(v) for clamping component
Search Design – Timing Closure • Delay pivot element splits congestion identical paths by budget • Will always choose a budget-compliant path (sum of finite congestion costs are finite) • Over all connections => successful routing always yields timing closure!
Performance – [Low-Hanging] Optimizations • Original implementation is around ~ 2-2.5x slower than current runtime • Introduced some low-hanging speed & quality optimizations • Index structure for lexicographic costs • Greedy tree mgmt. to ameliorate pin-ordering • A high-hanging optimization in future work is congestion schedule handling (but many promising leads from global routers in ICCAD’07)
Trie-of-Stacks Index Structure • Replaces f(v) index structure • Exploits FPGA routing symmetry • Index operations independent of size • Reduces runtimes by ~15 %
Experiments - Setup • Run against VPR4.30 on architecture similar to single-segment “challenge” arch. • (Researcher timing constraints) • routability comparison with unclamped lex-search • Route at the placement allowed Fmax • VPR pres_fac=1.5/1.1
Future Work • Runtime improvements • Schedule improvement • Performance tuning • Multi-CLB segments (see backup slide) • Multi-objective routing • Other domains (e.g. standard cell global)
Conclusions • Extended lexicographic search to timing-driven routing • New budgeting component • Clamped search design • Supporting techniques for runtime • Timing closure is guaranteed on routing success • Solution quality is good but need more runtime improvement to be viable
Acknowledgements • J. Rose, V. Betz, A. Marquardt (Toronto) – VPR4.30 source & benchmarks • Australian Centre for Advanced Computing and Communications (ac3) – High Performance Computing Support • Advisor*: Dr. Aleks Ignjatovic
Question Time… To Backup Slides
Issues with h(v) ~/~ h*(v) • “Node locking” occurs when g(v)+h(v) <= D but really g(v)+h*(v) > D • Expansion downstream will be truncated • But a subpath with less delay but more congestion cannot expand into it • But if reexpand on shorter delay then backtrace will ignore congestion – not locally decidable! • Quick fix: precompute h*(v) (Only needed for sink pins t) – Only bounding components need the accuracy • Fancy on-the-fly handling?