1 / 23

Keith So University of New South Wales, Sydney, Australia Feb 25 @ FPGA’08

Enforcing Long-Path Timing Closure for FPGA Routing with Path Searches on Clamped Lexicographic Spirals. Keith So University of New South Wales, Sydney, Australia Feb 25 @ FPGA’08. Outline. Problem Statement Related Work SpiralRoute Overview Budget Generation

aderyn
Download Presentation

Keith So University of New South Wales, Sydney, Australia Feb 25 @ FPGA’08

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enforcing Long-Path Timing Closure for FPGA Routingwith Path Searches on Clamped Lexicographic Spirals Keith So University of New South Wales, Sydney, Australia Feb 25 @ FPGA’08

  2. Outline • Problem Statement • Related Work • SpiralRoute Overview • Budget Generation • Clamped Lexicographic Search • Some Performance Optimizations • Experiments • Conclusions and Future Work

  3. Problem Statement & Assumptions Long-Path Timing-Driven Detailed Routing • Given: Placed circuit mapped into RR Graph + Timing Requirement D • Find: Mutually RR-vertex disjoint routing trees s.t. Max. Long-Path Comb. Delay <= D Assumptions • D is achievable under given placement • Buffered switching (delays summable)

  4. Related Work • [F’92] Iterative slack allocation • [AR’95] Criticality bin + Steiner/Arbor. • [ME’95] Negotiated Congestion • [BR’97] VPR • [LW’03] Lagrangian Rel. Weighting • [ANC+’04] Auto. Constraint Gen. • [FBC’04] RCV

  5. SpiralRoute Overview • Negotiated Congestion Routing over A* • Paths are lexicographic-costed [S’07,ISPD07] Major Deltas • Optimal delay upper bound generation for FPGA routing domain • Minimum-congestion bounded-delay searching (vs tradeoff using weights) • Provable timing closure at completion

  6. Connection Budget Generation – Optimization Component Weighted Budget Distribution Problem [Ghiasi et.al, ICCAD’04] Given: DAG G=(V,A), min. delays dij, weights wij, long-path constraint T Find: delay budgets bij such that: • (dij+bij) summed along all paths satisfies T • Sum of (wij.bij) over all edges is maximised Transforms into min-cost flow problem; budgets recovered from dual of flow solution.

  7. Connection Budget Generation – Mapping to FPGA Routing • Represent LE’s and pads as edges (split clocked LE’s) • Form super-DAG • dij = min connection delay (from congestion-oblivious routing) • Set T = D • Set wij = 1 for real edges, 0 for virtuals • Solved (dij+bij) is the maximum delay for each edge in our routing

  8. Comparison with It. Minimax PERT(clma runtime ~ 20mins)

  9. Search Design – n-Lex. Search • [1-Line A* search: f(v)=g(v)+h(v), expand v with minimum f(v) until t] • 2-component lexicographic search used for routability router (Conceptually a*∞ + b) • Need n-components and custom comparison functions (proofs needed to avoid ∞k values!) Theorem A* of n-lexicographic search is admissible if all components are totally-ordered monoids with order-preserving addition • Monoids helpful to avoid clutter from max()

  10. Search Design – Clamping Component • 3-component vector • Delay, with pivot (x < y iff x <= T & y > T) • Congestion, regular < • Delay, regular < Ex: f(w2)=[0,2,2]; f(x1)=[1,0,4]; f(w3)=[0,1,3] Assumption h(v) is at least close to h*(v) for clamping component

  11. Search Design – Timing Closure • Delay pivot element splits congestion identical paths by budget • Will always choose a budget-compliant path (sum of finite congestion costs are finite) • Over all connections => successful routing always yields timing closure!

  12. Performance – [Low-Hanging] Optimizations • Original implementation is around ~ 2-2.5x slower than current runtime • Introduced some low-hanging speed & quality optimizations • Index structure for lexicographic costs • Greedy tree mgmt. to ameliorate pin-ordering • A high-hanging optimization in future work is congestion schedule handling (but many promising leads from global routers in ICCAD’07)

  13. Trie-of-Stacks Index Structure • Replaces f(v) index structure • Exploits FPGA routing symmetry • Index operations independent of size • Reduces runtimes by ~15 %

  14. Tree Topology Maintainence

  15. Experiments - Setup • Run against VPR4.30 on architecture similar to single-segment “challenge” arch. • (Researcher timing constraints) • routability comparison with unclamped lex-search • Route at the placement allowed Fmax • VPR pres_fac=1.5/1.1

  16. Routed Solution Timing Quality

  17. Runtime Comparison

  18. Effects of Budget Quality

  19. Future Work • Runtime improvements • Schedule improvement • Performance tuning • Multi-CLB segments (see backup slide) • Multi-objective routing • Other domains (e.g. standard cell global)

  20. Conclusions • Extended lexicographic search to timing-driven routing • New budgeting component • Clamped search design • Supporting techniques for runtime • Timing closure is guaranteed on routing success • Solution quality is good but need more runtime improvement to be viable

  21. Acknowledgements • J. Rose, V. Betz, A. Marquardt (Toronto) – VPR4.30 source & benchmarks • Australian Centre for Advanced Computing and Communications (ac3) – High Performance Computing Support • Advisor*: Dr. Aleks Ignjatovic

  22. Question Time… To Backup Slides

  23. Issues with h(v) ~/~ h*(v) • “Node locking” occurs when g(v)+h(v) <= D but really g(v)+h*(v) > D • Expansion downstream will be truncated • But a subpath with less delay but more congestion cannot expand into it • But if reexpand on shorter delay then backtrace will ignore congestion – not locally decidable! • Quick fix: precompute h*(v) (Only needed for sink pins t) – Only bounding components need the accuracy • Fancy on-the-fly handling?

More Related