1 / 54

Constraint-Driven Large Scale Circuit Placement Algorithms

Explore scalable circuit placement algorithms focusing on optimality, scalability, and routability to enhance mixed-size placement and white space allocation. Discuss diverse applications and future works.

bcowan
Download Presentation

Constraint-Driven Large Scale Circuit Placement Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constraint-Driven Large Scale Circuit Placement Algorithms Advisor: Prof. Jason Cong Student: Min Xie September, 2006

  2. Outline • Chapter 1. Introduction • Chapter 2. Optimality and scalability study of existing placement algorithms • Chapter 3. Routability driven multilevel global placement and white space allocation • Chapter 4. A robust legalization scheme for mixed-size placement • Chapter 5. Applications of mixed-size placement legalization • Chapter 6. “Global” localized preprocessing for detailed placement • Chapter 7. Heterogeneous placement for FPGAs • Chapter 8. Conclusions and future works UCLA VLSICAD LAB

  3. Publication List • Cong. J, Xie M., and Zhang Y. “An Enhanced Multilevel Routing System,” Proceedings of the ICCAD, pp. 51-58, 2002. • Chang C., Cong J. and Xie M., “Optimality and Scalability of Existing Placement Algorithms,” Proceedings of ASPDAC, pp. 621-627, 2003. • Cong J., Romesis M. and Xie M., “Optimality, Scalability and Stability Study of Existing Partitioning and Placement Algorithms,” Proceedings of ISPD, pp. 88-94, 2003. • Cong J., Romesis M. and Xie M., “Optimality and Stability Study of Timing-driven Placement Algorithms,” Proceedings of ICCAD, pp. 472-478, 2003. • Cong J., Kong T., Shinnerl J. Xie M. and Yuan X. “Large-Scale Circuit Placement: Gap and Promise,” Proceedings of ICCAD, pp. 883-890, 2003. • Chang C., Cong J. Romesis M. and Xie M., “Optimality and Scalability of Existing Placement Algorithms,” IEEE TCAD, vol. 23, no. 4, pp. 537-549, 2004. UCLA VLSICAD LAB

  4. Publication List • Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement and White Space Allocation,” Proceedings of ICCAD, pp. 883-890, 2004. • J. Cong, J. Fang, M. Xie, and Y. Zhang, "MARS - A Multilevel Full-Chip Gridless Routing System,"IEEE TCAD, Vol. 24, No. 3, pp. 382-394, March 2005. • J. Cong, T. Kong, J. Shinnerl, M. Xie, and X. Yuan, "Large Scale Circuit Placement," ACM TODAES, Vol. 10, No. 2, pp. 389-430, April 2005. • Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement and White Space Allocation,” IEEE TCAD, to appear. • T. Chan , J. Cong  M. Romesis  J. Shinnerl, K. Sze, M. Xie, “mPL6: A Robust Multilevel Mixed-size Placement Engine,” Proceedings of ISPD, pp. 227-229, April 2005. • Cong J. and Xie M., “A Robust Detailed Placement Algorithm for Mixe-size IC Designs”, Proceedings of ASPDAC, pp.188-194., 2006. • J. Cong, T. Chan, J. Shinnerl, K. Sze and M. Xie, "mPL6: Enhanced Multilevel Mixed-size Placement,"  Proceedings of the ISPD, pp. 212-214, April 2006. UCLA VLSICAD LAB

  5. Relative Wirelength A Brief History of mPL • mPL 1.1 • FC-Clustering • added partitioning to legalization • mPL 1.0 [ICCAD00] • Recursive ESC clustering • NLP at coarsest level • Goto discrete relaxation • Slot Assignment legalization • Domino detailed placement UNIFORM CELL SIZE • mPL 2.0 • RDFL relaxation • primal-dual netlist pruning • mPL 3.0 [ICCAD 03] • QRS relaxation • AMG interpolation • multiple V-cycles • cell-area fragmentation • mPL 4.0 • improved DP • better coarsening • backtracking V-cycle NON-UNIFORM CELL SIZE • mPL5,mPL6 • Multilevel Force-Directed 2002 2003 year 2000 2001 2004 UCLA VLSICAD LAB

  6. Given problem Problem size decreases Interpolation & Relaxation (optimization) Coarsening(Clustering) Multiscale Optimization Framework • Explores different scales of the solution space at different levels • Supports VERY FAST and SCALABLE methods • Supports inclusion of complicated objectives and constraints • Successful across MANY DIVERSE applications UCLA VLSICAD LAB

  7. Logsum wirelength Average bin density Equality constraint Average bin density = utilization ratio mPL6 – Generalized Force Directed Refinement v4 3 v5 v3 2 v6 v2 1 v7 v1 1 3 4 2 = a13(v7) = fractional area of cell v7 in bin B13 UCLA VLSICAD LAB

  8. mPL6 – Iterative Flow • Bestchoice clustering [Alpert et al, ISPD05] • AMG declustering [Chen et al, DAC03, Chan et al ICCAD03] • Multiple V cycle with distance based reclustering [Chan et al, ICCAD03] Level 3 C+I C I I Level 2 C+I C I I Level 1 UCLA VLSICAD LAB

  9. Outline • Chapter 1. Introduction • Chapter 2. Optimality and scalability study of existing placement algorithms • Chapter 3. Routability driven multilevel global placement and white space allocation • Motivation and previous work • Routability-driven multilevel placement • Experiment results • Conclusions and future work • Chapter 4. A robust legalization scheme for mixed-size placement • Chapter 5. Applications of mixed-size placement legalization • Chapter 6. “Global” localized preprocessing for detailed placement • Chapter 7. Heterogeneous placement for FPGAs • Chapter 8. Conclusions and future works UCLA VLSICAD LAB

  10. Motivation • mPL does not consider routing congestion • Aggressive HPWL minimization != routability • Routability-driven placement • Routability modeling • Routability optimization UCLA VLSICAD LAB

  11. Previous Work -- Routability Modeling • Topology-free methods • Dragon [Yang et al., TCAD03] • Sparse [Hu et al., ICCAD02] • BonnPlace [Brenner & Rohe, ISPD02] • Topology-based methods • [Mayrhofer & Lauther, ICCAD90] • mPG [Chang et al., ISPD02] UCLA VLSICAD LAB

  12. Previous Work -- Routability Optimization • Cell weighting • Cell inflation based on congestion • Constructive and iterative methods • Dragon [Yang et al, TCAD03] • BonnPlace [Brenner & Rohe, ISPD02] • Net weighting • Translate into bin weights and optimize weighted wirelength • Iterative methods • Sparse [Hu & Sadowska, ICCAD02] • mPG [Chang et al, ISPD02] UCLA VLSICAD LAB

  13. Routability-Driven Multilevel Placement • Global placement • Congestion estimation by a fast LZ router • Congestion-driven cell re-placement based on weighted wirelength • Hierarchical top-down white space allocation • Geometric-based slicing tree • Congestion estimation on tree • Cutline adjustment UCLA VLSICAD LAB

  14. Right region Left region mPL-R Congestion Estimation with LZ Router • Use LZ-Router [Chang et al., ISPD02] for fast congestion analysis on each level • Binary search on V-stem (or H-stem) • Initialize left region and right region to cover bounding box • Repeat • Query wire usage on both regions • Select region with less congestion VHV HVH Less congested More congested UCLA VLSICAD LAB

  15. WLc = 15.5 WLc = 9.2 • Search adjacent bins within certain window • Choose the bin based on weighted WL mPL-R Congestion-Driven Re-Placement • Pick cells whose incident nets cross congested regions to move • Start from the optimal location for HPWL 2.0 0.5 1.2 UCLA VLSICAD LAB

  16. A B E F D C G H root Cut direction Cut location Node area Congestion • Estimate congestion on leaf nodes. Congestion on other nodes can be computed from bottom to top. A B C D E F G H White Space Allocation -- Slicing Tree Construction • Recursively bipartition chip region from top to bottom. • Group cells into children nodes according to location relative to cutline. UCLA VLSICAD LAB

  17. A A B B E E F F D D C C G G H H 240/88 116/28 124/60 A B C D E F G H cell area/congestion White Space Allocation – Cutline Adjustment • Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their overflow. root Assuming chip area of root = 300 Total WS area = 300 – 240 = 60 WS area for left child = 60*28/(28+60) = 19.1 WS area for right child= 40.9 Chip area for left child = 116+19.1 = 135.1 Chip area for right child = 124+40.9 = 164.9 UCLA VLSICAD LAB

  18. A B C D E F G H White Space Allocation – Cutline Adjustment • Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions. A B E F D C G H root 240/88 cell area/congestion 116/28 124/60 62/19 58/34 54/9 66/26 UCLA VLSICAD LAB

  19. White Space Allocation – Cutline Adjustment • Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions. A B E F D C G H root 240/88 cell area/congestion 116/28 124/60 A B C D E F G H UCLA VLSICAD LAB

  20. Experiment Setup • 16 IBM version 2 examples • 5% to 15% white space • Three state-of-the-art routability-driven placers • Dragon-fd 3.01 [Yang et al, TCAD03] • Simulated annealing with bin swapping • Two-step white space allocation • Capo 10.0 [Roy et al, ISPD06] • Fast steiner tree approximation • Congestion based cutline shifting • Fengshui 5.1 [Agnihotri et al, ISPD05] • Recursive bi-section • Similar white space allocation method incorporated • Magma router for evaluation UCLA VLSICAD LAB

  21. Routability-Driven Placement Tools Comparison mPL-R+WSA is the only flow to produce all successful routing mPL-R+WSA produces the shortest wirelength UCLA VLSICAD LAB

  22. Routability Optimization Techniques Comparison • mPL • Latest pure WL-driven version • No consideration of routing congestion • mPL-R • mPL-I • Cell inflation + dummy density assignment • Highest quality in ISPD06 contest [Nam ISPD06] • Density target set as utilization • mPL+WSA • mPL-R+WSA UCLA VLSICAD LAB

  23. Routability Optimization Techniques Comparison mPL-I with heuristic penalty term does not perform very well Both mPL-R and WSA improves routability significantly Combined workflow gives the highest completion rate UCLA VLSICAD LAB

  24. Outline • Chapter 1. Introduction • Chapter 2. Optimality and scalability study of existing placement algorithms • Chapter 3. Routability driven multilevel global placement and white space allocation • Chapter 4. A robust legalization scheme for mixed-size placement • Chapter 5. Applications of mixed-size placement legalization • Enhancement for macro legalization algorithm • Additional experiment results • Chapter 6. “Global” localized preprocessing for detailed placement • Chapter 7. Heterogeneous placement for FPGAs • Chapter 8. Conclusions and future works UCLA VLSICAD LAB

  25. ? Enhancement for Macro Legalization • Constraint graph reduction • Original constraint graph • One edge for each pair of macros • O(n2) in total • Reduced constraint graph • Edge inserted only when no transitive closure present • Significant reduction of memory consumption A C B UCLA VLSICAD LAB

  26. Experiment Result with ICCAD04-MS • 84% reduction of constraint edges • No degradation of solution quality UCLA VLSICAD LAB

  27. Enhancement for Macro Legalization fij x Hij • Used in ISPD 2006 placement contest UCLA VLSICAD LAB

  28. ISPD05 Examples • Bigger problem size • Suitable to test scalability UCLA VLSICAD LAB

  29. Scalability Comparison on ISPD05-- Global Placements by APlace • XDP produces 1% longer WL, but is 10X faster UCLA VLSICAD LAB

  30. Scalability Comparison on ISPD05-- Global Placements by mPL • XDP can be 10x faster with comparable quality UCLA VLSICAD LAB

  31. Impact of Gradual Macro Legalization – ISPD05 • 12 % WL reduction possible with macros movable UCLA VLSICAD LAB

  32. Outline • Chapter 1. Introduction • Chapter 2. Optimality and scalability study of existing placement algorithms • Chapter 3. Routability driven multilevel global placement and white space allocation • Chapter 4. A robust legalization scheme for mixed-size placement • Chapter 5. Applications of mixed-size placement legalization • Chapter 6. “Global” localized preprocessing for detailed placement • Chapter 7. Heterogeneous placement for FPGAs • Motivation and previous works • Multilevel heterogeneous placement – mPL-H • Experiment results • Conclusions and future work • Chapter 8. Conclusions and future works UCLA VLSICAD LAB

  33. Motivation • Popularity of FPGAs • Ease of use • Low cost for small to medium production • Modern FPGA placement impose heterogeneous constraints • Memory block of different capacity, DSP blocks • Each block should only be placed on sites of the same type UCLA VLSICAD LAB

  34. Example FPGA Chip Figure taken from Altera Stratix Handbook UCLA VLSICAD LAB

  35. Previous Works -- Academia • Simulated annealing • VPR [Betz & Rose, FPL97, Marquardt et al, FPGA00] • PATH [Kong, ICCAD02] • SPCD [Chen & Cong, FPL04, FPGA05] • Partitioning • PPFF [Maidee et al, DAC03] • Graph embedding • CAPRI [Gopalakrishnan et al, DAC06] • Multilevel • Ultrafast-VPR [Sankar & Rose, FPGA99] • mPG-ms [Cong & Yuan, ASPDAC03] • None of them handle heterogeneous constraint UCLA VLSICAD LAB

  36. Previous Works -- Industry • Quartus II by Altera Corporation • Stratix, Stratix II, etc. • ISE by Xilinx Corporation • Virtex II, Virtex II Pro, etc. • Do have heterogeneous capability • Only for proprietary chip architecture • Algorithms and techniques not publicly documented UCLA VLSICAD LAB

  37. Multilevel Heterogeneous Placement – mPL-H • Based on multilevel generalized force directed placement • Multi-layered placement to handle heterogeneous placement • Filler cells to enhance quality and stability • Gradual carry chain legalization UCLA VLSICAD LAB

  38. Limitations of mPL for Heterogeneous Placement • Does not consider heterogeneous constraints • Any block can be placed anywhere • Requires density to be uniform everywhere • Penalize wirelength for low utilization UCLA VLSICAD LAB

  39. mPL-H -- Global Placement (I) • Multiple layers, each layer for each resource • DSP layer • M-RAM layer • LAB layer • M4K layer • M512 layer • Forbidden regions blocked by obstacles • Uniform wirelength computation DSP M-RAM LAB UCLA VLSICAD LAB

  40. mPL-H -- Global Placement (II) • Filler cell • Occupy the residual capacity • Transform inequality into equality • Density computed independently on each layer • Granularity may not be fine enough UCLA VLSICAD LAB

  41. sites cells mPL-H -- Legalization (I) • DSP and memory blocks • Domains do not overlap • Legalized independently • Uniform size for the same type • Linear assignment O(n3) • Cost as distance UCLA VLSICAD LAB

  42. mPL-H -- Legalization (II) • Carry chains • Vary in length • Legalized in descending order of length • Partition each column into same size • Assign chains of same length using linear assignment UCLA VLSICAD LAB

  43. mPL-H -- Legalization (III) • Column-wise rearrangement of carry chains • P(n,m) is the minimum perturbation of assign (v1,…vn) to sites (s1,s2,…sm) • P(1,j) = d(1,j), d(1,j) is the perturbation of assigning v1 to site sj • P(i,j) = min{P(i-1,j-hi), P(i, j-1)} • Can be solved more efficiently for some special cases • Quadratic distance • No site constraint UCLA VLSICAD LAB

  44. Experiment Setting Verilog netlist Quartus_map Clustered .vqm netlist Architecture Description XML Quartus_fitter mPL-H Chip type .qsf placement .qsf placement Quartus_router UCLA VLSICAD LAB

  45. QUIP Suite UCLA VLSICAD LAB

  46. Wirelength Comparison mPL-H is 3% better in HPWL, and 2% better in routed WL than Quartus II v5.0 UCLA VLSICAD LAB

  47. Runtime Comparison mPL-H can be 2X faster than Quartus II v5.0 when the circuit becomes sufficiently large UCLA VLSICAD LAB

  48. Optimality Study of mPL-H • PEKO-H construction • Populate all sites with corresponding resource type • Generate each net with optimal wirelength • Extract the netlist in the end UCLA VLSICAD LAB

  49. Experiment Results with PEKO-H mPL-H produces HPWL 34% longer than the optima UCLA VLSICAD LAB

  50. Displacement of PEKO-H13 UCLA VLSICAD LAB

More Related