Taming the Complexity of Coordinated Place and Route

Taming the Complexity of Coordinated Place and Route EECS 527. Layout Synthesis and Optimization Taming the Complexity of Coordinated Place and Route By Jin Hu, Myung-Chul Kim and Igor Markov Presented By: Alvin Li

Taming the Complexity of Coordinated Place and Route • Introduction • Background • LIRE: Routing Estimation • Congestion Relief • Coordinated Place and Route • Empirical Validation • Comparison to Prior Arts • Conclusions

1. Introduction Interconnects • 3 layers • Uniform pitch • More than 3 layers • Non-uniform pitch

1. Introduction • Interconnect complexities increased since 1980s • Increased to 9-12 layers(non-uniform pitch) from 3 • Longer routing times • Lower quality of IC circuits Interconnects (From Fig. 6.17 Chapter 6 VLSI Physical Design of Integrated Circuits)

1. Introduction • Interconnects Dominate • IC Performance • Power Dissipation • Size • Signal Integrity

1. Introduction: Significance of the Paper • Global Placement & Global Routing • Standalone vs. integrated • - Signal integrity and coupling capacitances in interconnect A set of individual optimizations or one simultaneous optimization? • Streamlined System: Coordinated Place-and-Route(CoPR) • Routing estimation during placement • Placement technique that addresses three types of routing congestion • Interface to congestion elimination

2. Background – Dijkstra’s Algorithm • Also known as Maze Routing • Finds shortest path from source node to target node • Graph with non-negative edge

2. Background – Dijkstra’s Algorithm

2. Background – A* Search Algorithm • Extension of Dijkstra’s Algorithm, but faster • Estimates distance to target • Node priority: Group 2 label in Dijkstra’s Algorithm + Distance estimate, including vias, to the target node 31 Nodes vs. 6 Nodes visited

2. Background – Key Characteristics of A* Search Algorithm

2. Coordinated Place-and-Route Proposed Improvement to A* Search Algorithm: Streamlined System: Coordinated Place-and-Route(CoPR) • Cache-friendly routing primitives: estimate routing congestion • Leverages incrementality in routing and congestions updates • New categorization of congestion • New congestion-relief techniques

3. LIRE: Routing Estimation • Lightweight Incremental Routing Estimator • Congestion maps like global router • 75K nets per second (can tradeoff between quality and run time)

3. LIRE: Routing Estimation

3.1 Faster Routing • Traditional Global Routing: Maze Routing • Priority queue  complex and slow • Large history based cost • Lacks incrementality • Linear-time cache-friendly routing • Avoid priority-queue-based approaches • Avoid pointers to improve cache hit rate Bellman-Ford Algorithm

3.1 Faster Routing – Bellman Ford Algorithm Bellman – Ford Algorithm(1958) • Slower than Dijkstra’s Algorithm • E * O(1) relaxation steps • Goes through all nodes • Relaxes all edges instead of greedily selecting minimum weight node not yet processed to relax • Calculates all path and repeat (N-1) times (N = number of vertices) • Visits nodes randomly

3.1 Faster Routing – Bellman Ford Algorithm Bellman – Ford Algorithm(1958)

3.1 Faster Routing – Bellman Ford Algorithm Monotonic Routing with One Linear-Time BF Pass • Consider only forward edges • Only consider the space bounded by S and T • Visit in order, going through each node once  runtime complexity is O(N) (N = number of nodes in the space bounded by S and T)

3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass • Duplex-edge relaxation: relaxation in both directions • Echo-relaxation: propagate smaller cost through all recently relaxed edge incident to the point • Effective in detouring short nets (majority of nets are short)

3.1 Faster Routing – Bellman Ford Algorithm Non-monotonic Routing with One Linear-Time BF Pass

3.1 Faster Routing • Bellman-Ford with Yen’s improvement (1970) • J.Y. Yen suggested reversing the node ordering between BF passes • Reduces the number of passes required to find optimal path • BFY finds optimal paths faster than A*-search for most nets in the experiment (Theorem 1)

3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement • First forward pass finds optimal monotonic path

3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement • Backward pass finds a detour

3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement • Second forward pass finds optimal path

3.1 Faster Routing • Bellman-Ford with Yen’s improvement (1970) • With m passes, runtime complexity is O(mN) (N = number of nodes in the space bounded by S and T) • Limit m to reduce runtime • Small loss of optimality • Focus on incremental calls to BFY • Incremental Routing with BFY • Records partial costs along an existing route to reduce runtime(rip-up-and-reroute and repeated invocations of LIRE during placement) • Faster!

3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY • Initial route with BFY

3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY • Through relaxation, BFY preserve part of the route • and find a better partial segment

4. Congestion Relief • Main Goal: To increase the porosity of placement regions with high routing congestion • How? • After global placement, shift cell locations and use congestion driven detailed placement • During global placement, inflate cells based on early congestion estimates and pin density

4. Congestion Relief Traditional ways are insufficient: • After global placement, shift cell locations and use congestion driven detailed placement • Must preserve the structure of resulting placement or risk unbearable deterioration of interconnect length • During global placement, inflate cells based on early congestion estimates and pin density • When they move outside the congest region, new cells must be inflated, which may consume all whitespace without solving root cause

4. Congestion Relief – Further Analysis • 3 Types of Routing Congestion: • Cell based congestion caused by cell-to-cell proximity • Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities • Remotely-induced layout based congestion attributed to non-local factors such as long net

4. Congestion Relief – Further Analysis • Cell based congestion caused by cell-to-cell proximity • Mitigated by cell inflation(only top5% most congested GCellsto avoid exhausting whitespace) • Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities • Locally inject whitespace(move cells out of congested region) • Remotely-induced layout based congestion attributed to non-local factors such as long net • Enforce non-uniform target density by: • i) Creating a packing peanut(fixed cell) at the center of every GCell • ii) Modify its size based on congestion

5. Coordinated Place and Route Integration of Routing and Placement • Incremental placement updates • After its first invocation, LIRE maintains the overall congestion map and keeps track of the GCells traversed by each point by point connection • In next invocation, if the endpoints remain the same, it is left unchanged • Has pronounced effect in later iterations and during detailed placement, when locations are stabilized

5. Coordinated Place and Route Integration of Routing and Placement • Incremental-routing updates • When invoked for first time, LIRE generates routes from scratch. • After that, it reuses existing routes where possible • Nets whose terminals relocated to different Gcells are rerouted using the original net ordering • Remaining nets are checked if their routes are congested, and it is mitigated by single incremental BFY passes • Replicates accuracy of maize router, but a better runtime

6. Empirical Validation Verifying Result • Implemented in CoPR in C++ using the OpenMP library, compiled with g++4.7.0 • Global placer derived from SimPL • Used by three of the top four teams at the ICCAD 2012 Contest • Reported on the ICCAD 2012 benchmark by IBM researchers

6. Empirical Validation • Based on same run-time, CoPR outperforms the finalists of ICCAD 2012 Contest by 7% and 2% in quality metrics. It is 5.7 faster than another contestant with same quality. • With respect to scoring formulas used at the ICCAD 2012 Contest, CoPR outperforms the winner.

7. Comparisons to Prior Art • Fast Routing:“A Fast Maze-free Routing Congestion Estimator With Hybrid Unilateral Monotonic Routing” by W.-H. Liu, Y.-L. Li and C.-K. Kok • Replaces A* - Search with fast linear-time routing algorithms that exploit a different notion of monotonic routes • Uses multiple passes to find non-monotonic routes and does not claim optimality • Doesn’t consider CPU cache effects and the connection with BFY • Not used to drive competitive global placer in comparison to the successful results for coordinated place-and-route by CoPR • CoPR’s authors completed their work before this paper was published or made available

7. Comparisons to Prior Art • Fast Routing:“BonnTools: Mathematical Innovation for Layout and Timing Closure of Systems on a Chip” by B. Korte, D. Rautenbach and J. Vygen • Speeds up Dijkstra’s algorithm with sophisticated data structures and algorithms • Uses more memory for advanced data structure and requires significant up-front set-up • Singled-threaded version of LIRE takes <15% of runtime in the entire place-and-route flow • CoPR’s authors avoided sophisticated routing algorithms and data structures

Taming the Complexity of Coordinated Place and Route