310 likes | 320 Views
This study examines the optimality, scalability, and stability of partitioning and placement algorithms in VLSI design. The authors present construction examples with known upper bounds and explore the room for further improvement. The study also compares existing heuristics and algorithms to understand their suboptimality.
E N D
Optimality, Scalability and Stability study of Partitioning and Placement Algorithms Jason Cong, Michail Romesis, Min Xie UCLA Computer Science Department This work is partially supported by Semiconductor Research Corporation and National Science Foundation
Overview • Motivation and related work • Our contribution • Construction of Partitioning Examples with Known Upper bound • Construction of Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work
Overview • Motivation and related work • Our contribution • Construction of Partitioning Examples with Known Upper bound • Construction of Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work
Motivation Partitioning • Significant progress in partitioning during the mid-to-late 90’s • No significant improvement in the last 5 years • Have we reached a plateau?
Motivation Placement • Lack of significant progress in wirelength reduction • Rate of reduction is about 5-10% every 2-3 years • Latest developments in placement differ mainly in runtime • Capo [A. Caldwell et al, 2000] • Dragon [M. Wang et al, 2000] • Mongrel [S. Hur et al, 2000] • mPL [T. Chan et al, 2000] • mPG [C. Chang et al, 2002] • How much is the room for further improvement?
Motivation • Most work compare only with known heuristics • Use real design based benchmarks • ISPD98 [C. Alpert 1998] • WSI [D. Ghosh et al, 1997] • Use synthetic benchmarks • circ and gen [M. D. Hutton et al, 1998] • gnl [D. Stroobandt et al, 2000] • Little understanding about the divergence from the optimal
x x x x x x x x x x Related Work • Quantified Suboptimality of VLSI Layout Heuristics [L. Hagen et al, 1995] • Construct scaled instance with known upperbound from an initial problem ? • Over 10% area suboptimality in TimberWolf • Notable wirelength suboptimality in GORDIAN-L • Significant improvement was possible for placement and partitioning • But test cases are small, the largest netlist is less than 40K
Related Work • Optimality and Scalability of Existing Placement Algorithms [C. Chang et al, 2003] • Construct instances with known optimal using the characteristic of the original problem ? • Existing placement algorithms can be 70% to 150% away from the optimal • Average solution quality deteriorates by an additional 4% to 25% when the problem size increases by a factor of 10 • All the connections are local, no global connections
Overview • Motivation and related work • Our contribution • Construction of Partitioning Examples with Known Upper bound • Construction of Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work
P1 P2 • Create two partitions of size 8 • Generate 9 2-pin nets that do not cross the partition line A B • Generate 3 2-pin nets that cross the partition line • Generate 6 3-pin nets that do not cross the partition line • Generate 2 3-pin nets that cross the partition line C D BEKU Construction Example Input: t = 16, D={12,8} B = 5 • Cutsize improved to 4 after FM • Cutsize = 5
Construction of Multiway Partitioning Examples with Known Upper Bounds (MEKU) • Divide the nodes into m partitions of equal size • Create B nets that cross at least two partitions. The remaining nets stay in one partition • Improve by multiway FM
BEKU and MEKU Suite • 2-way partitions occupy 45-55% of the total area • 8-way partitions occupy 11.8-13.3% of the total area URL : http://cadlab.cs.ucla.edu/~pubbench/partitioning/
Tested three State-of-the-Art Partitioning Tools • hMetis [G. Karypis et al, 1997] • Based on multilevel framework • MHEC and FC clustering algorithms • Variations of FM for refinement at each level • MLPart [A. Caldwell et al, 2000] • Based on multilevel framework • Different algorithms for coarsening (PinEC) and refinement (VRW) • Flare [J. Cong et al, 2000] • Two-level hierarchy created by the ESC clustering algorithm • Based on the LR bipartitioning engine and the PM multiway partitioning framework
Experimental Results on BEKU • MLPart produces the best results (very close to our estimated upper bound), and Flare the worst • The value of the bound (as a percentage of nets) influences the quality of hMetis and Flare
Experimental Results on BEKU • The runtime scale well (almost linearly) • Flare runs out of memory when problem size exceeds 1M nodes
Experimental Results on MEKU • hMetis is worse by only 2% when the initial bound is 30%, but the gap increases to 18% for a bound of 35% • MLPart does not support multiway partitioning
Placement Examples with Global Connections • Produced by Dragon on ISPD98 • The wirelength contribution from global connections can be significant! • Need to consider the impact of global connections
Placement Examples with Global Connections only • Each net connects either a row or column • Obvious upper bound • Sum the length of each row and column • Similar to datapath examples
Placement Examples with Non-local Connections • Extend PEKO [ C.Chang 2003] by introducing non-local nets to mimic global connections • All the modules are of equal size, and there is no space between rows and adjacent modules • For nets of degree i, *diof them are generated by randomly conneting i modules, the rest are generated optimally as in PEKO
Generate 28 2-pin optimally Generate 6 2-pin randomly Generate 16 3-pin optimally Generate 4 3-pin randomly Generate 6 4-pin randomly Generate 1 4-pin randomly Generate 4 5-pin optimally Generate 2 6-pin optimally Generate 1 7-pin optimally Placement Examples with Non-local Connections Input : t = 64, D = {d2=34,d3=20,d4=7,d5=4,d6=2, d7=1} =0.2 Total WL = 160
G-PEKU Suite • Module number extracted from ISPD98 URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm
PEKU Suite • Module number t and NDVs extracted from ISPD98 • Remove connections with pads • Vary from 0 to 10% • 15% white space by expanding one dimension of the chip
PEKU Suite URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm
Tested four State-of-the-Art Placers • Capo [A. Caldwell et al, 2000] • Based on multilevel partitioner • Aims to enhance the routability • Dragon [M. Wang et al, 2000] • Uses hMetis for initial partition • SA with bin-based swapping • mPL [T. Chan et al, 2000] • Nonlinear programming on the coarsest level • Goto based relaxation • mPG [C. Chang et al, 2002] • Uses FC clustering and hierarchical density control • Incremental A-tree for routability
Experimental Results on G-PEKU • The gap between their solutions and the upper bound varies between 79% and 102% in the worst case • Another validation that there is significant room for improvement for the placement problem
Experimental Results on PEKU • mPL’s QR increases when is increased from 0 to 0.75%, while for the other three placers, QRs are steadily decreasing • Absolute value of the QRs may not be meaningful, but it helps to identify the technique that works best under each scenario
Overview • Motivation and related work • Our contribution • Partitioning Examples with Known Upper bound • Placement Examples with Known Upper bound • Optimality, Scalability and Stability study • Conclusions and future work
Conclusions • Bipartitioning techniques seem fairly mature • The best available algorithms perform and scale very well on examples by our construction • The best available multiway partitioning algorithms do not perform equally well • The worst divergence from upperbound is 18% by hMetis • There is still significant room for improvement in circuit placement • Existing placement algorithms may produce solutions far away from the optimal (or upper bound) • Their effectiveness depends much on the characteristic of circuits
Future Work • Construction of more synthetic examples • Measure routability optimality • Measure timing optimality • Understand the deficiencies of existing algorithms using these examples • Guide the development of new VLSI CAD algorithms
Acknowledgement • Prof. I. Markov for providing Capo’s latest version • Prof. S. Lim for providing Flare’s latest version • X. Yuan for providing the data of mPG • J. Shinnerl and K. Sze for providing the experimental data of mPL
THE END THANK YOU