640 likes | 763 Views
Taming the Complexity of Coordinated Place and Route. EECS 527. Layout Synthesis and Optimization. Taming the Complexity of Coordinated Place and Route. By Jin Hu, Myung-Chul Kim and Igor Markov Presented By: Alvin Li. Taming the Complexity of Coordinated Place and Route. Introduction
E N D
Taming the Complexity of Coordinated Place and Route EECS 527. Layout Synthesis and Optimization Taming the Complexity of Coordinated Place and Route By Jin Hu, Myung-Chul Kim and Igor Markov Presented By: Alvin Li
Taming the Complexity of Coordinated Place and Route • Introduction • Background • LIRE: Routing Estimation • Congestion Relief • Coordinated Place and Route • Empirical Validation • Comparison to Prior Arts • Conclusions
1. Introduction Interconnects • 3 layers • Uniform pitch • More than 3 layers • Non-uniform pitch
1. Introduction • Interconnect complexities increased since 1980s • Increased to 9-12 layers(non-uniform pitch) from 3 • Longer routing times • Lower quality of IC circuits Interconnects (From Fig. 6.17 Chapter 6 VLSI Physical Design of Integrated Circuits)
1. Introduction • Interconnects Dominate • IC Performance • Power Dissipation • Size • Signal Integrity
1. Introduction: Significance of the Paper • Global Placement & Global Routing • Standalone vs. integrated • - Signal integrity and coupling capacitances in interconnect A set of individual optimizations or one simultaneous optimization? • Streamlined System: Coordinated Place-and-Route(CoPR) • Routing estimation during placement • Placement technique that addresses three types of routing congestion • Interface to congestion elimination
2. Background – Dijkstra’s Algorithm • Also known as Maze Routing • Finds shortest path from source node to target node • Graph with non-negative edge
2. Background – A* Search Algorithm • Extension of Dijkstra’s Algorithm, but faster • Estimates distance to target • Node priority: Group 2 label in Dijkstra’s Algorithm + Distance estimate, including vias, to the target node 31 Nodes vs. 6 Nodes visited
2. Coordinated Place-and-Route Proposed Improvement to A* Search Algorithm: Streamlined System: Coordinated Place-and-Route(CoPR) • Cache-friendly routing primitives: estimate routing congestion • Leverages incrementality in routing and congestions updates • New categorization of congestion • New congestion-relief techniques
3. LIRE: Routing Estimation • Lightweight Incremental Routing Estimator • Congestion maps like global router • 75K nets per second (can tradeoff between quality and run time)
3.1 Faster Routing • Traditional Global Routing: Maze Routing • Priority queue complex and slow • Large history based cost • Lacks incrementality • Linear-time cache-friendly routing • Avoid priority-queue-based approaches • Avoid pointers to improve cache hit rate Bellman-Ford Algorithm
3.1 Faster Routing – Bellman Ford Algorithm Bellman – Ford Algorithm(1958) • Slower than Dijkstra’s Algorithm • E * O(1) relaxation steps • Goes through all nodes • Relaxes all edges instead of greedily selecting minimum weight node not yet processed to relax • Calculates all path and repeat (N-1) times (N = number of vertices) • Visits nodes randomly
3.1 Faster Routing – Bellman Ford Algorithm Bellman – Ford Algorithm(1958)
3.1 Faster Routing – Bellman Ford Algorithm Monotonic Routing with One Linear-Time BF Pass • Consider only forward edges • Only consider the space bounded by S and T • Visit in order, going through each node once runtime complexity is O(N) (N = number of nodes in the space bounded by S and T)
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass • Duplex-edge relaxation: relaxation in both directions • Echo-relaxation: propagate smaller cost through all recently relaxed edge incident to the point • Effective in detouring short nets (majority of nets are short)
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing • Bellman-Ford with Yen’s improvement (1970) • J.Y. Yen suggested reversing the node ordering between BF passes • Reduces the number of passes required to find optimal path • BFY finds optimal paths faster than A*-search for most nets in the experiment (Theorem 1)
3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement • First forward pass finds optimal monotonic path
3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement • Backward pass finds a detour
3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement • Second forward pass finds optimal path
3.1 Faster Routing • Bellman-Ford with Yen’s improvement (1970) • With m passes, runtime complexity is O(mN) (N = number of nodes in the space bounded by S and T) • Limit m to reduce runtime • Small loss of optimality • Focus on incremental calls to BFY • Incremental Routing with BFY • Records partial costs along an existing route to reduce runtime(rip-up-and-reroute and repeated invocations of LIRE during placement) • Faster!
3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY • Initial route with BFY
3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY • Through relaxation, BFY preserve part of the route • and find a better partial segment
4. Congestion Relief • Main Goal: To increase the porosity of placement regions with high routing congestion • How? • After global placement, shift cell locations and use congestion driven detailed placement • During global placement, inflate cells based on early congestion estimates and pin density
4. Congestion Relief Traditional ways are insufficient: • After global placement, shift cell locations and use congestion driven detailed placement • Must preserve the structure of resulting placement or risk unbearable deterioration of interconnect length • During global placement, inflate cells based on early congestion estimates and pin density • When they move outside the congest region, new cells must be inflated, which may consume all whitespace without solving root cause
4. Congestion Relief – Further Analysis • 3 Types of Routing Congestion: • Cell based congestion caused by cell-to-cell proximity • Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities • Remotely-induced layout based congestion attributed to non-local factors such as long net
4. Congestion Relief – Further Analysis • Cell based congestion caused by cell-to-cell proximity • Mitigated by cell inflation(only top5% most congested GCellsto avoid exhausting whitespace) • Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities • Locally inject whitespace(move cells out of congested region) • Remotely-induced layout based congestion attributed to non-local factors such as long net • Enforce non-uniform target density by: • i) Creating a packing peanut(fixed cell) at the center of every GCell • ii) Modify its size based on congestion
5. Coordinated Place and Route Integration of Routing and Placement • Incremental placement updates • After its first invocation, LIRE maintains the overall congestion map and keeps track of the GCells traversed by each point by point connection • In next invocation, if the endpoints remain the same, it is left unchanged • Has pronounced effect in later iterations and during detailed placement, when locations are stabilized
5. Coordinated Place and Route Integration of Routing and Placement • Incremental-routing updates • When invoked for first time, LIRE generates routes from scratch. • After that, it reuses existing routes where possible • Nets whose terminals relocated to different Gcells are rerouted using the original net ordering • Remaining nets are checked if their routes are congested, and it is mitigated by single incremental BFY passes • Replicates accuracy of maize router, but a better runtime
6. Empirical Validation Verifying Result • Implemented in CoPR in C++ using the OpenMP library, compiled with g++4.7.0 • Global placer derived from SimPL • Used by three of the top four teams at the ICCAD 2012 Contest • Reported on the ICCAD 2012 benchmark by IBM researchers
6. Empirical Validation • Based on same run-time, CoPR outperforms the finalists of ICCAD 2012 Contest by 7% and 2% in quality metrics. It is 5.7 faster than another contestant with same quality. • With respect to scoring formulas used at the ICCAD 2012 Contest, CoPR outperforms the winner.
7. Comparisons to Prior Art • Fast Routing:“A Fast Maze-free Routing Congestion Estimator With Hybrid Unilateral Monotonic Routing” by W.-H. Liu, Y.-L. Li and C.-K. Kok • Replaces A* - Search with fast linear-time routing algorithms that exploit a different notion of monotonic routes • Uses multiple passes to find non-monotonic routes and does not claim optimality • Doesn’t consider CPU cache effects and the connection with BFY • Not used to drive competitive global placer in comparison to the successful results for coordinated place-and-route by CoPR • CoPR’s authors completed their work before this paper was published or made available
7. Comparisons to Prior Art • Fast Routing:“BonnTools: Mathematical Innovation for Layout and Timing Closure of Systems on a Chip” by B. Korte, D. Rautenbach and J. Vygen • Speeds up Dijkstra’s algorithm with sophisticated data structures and algorithms • Uses more memory for advanced data structure and requires significant up-front set-up • Singled-threaded version of LIRE takes <15% of runtime in the entire place-and-route flow • CoPR’s authors avoided sophisticated routing algorithms and data structures