1 / 65

VLSI Placement (I)

VLSI Placement (I). Prof. Lei He Http://eda.ee.ucla.edu. Thanks to Chis Chu, Jason Cong, Paul Villarubia and David Pan for contributions to slides. Problem formulation. Input: Blocks (standard cells and macros) B 1 , ... , B n Shapes and Pin Positions for each block B i

blithe
Download Presentation

VLSI Placement (I)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VLSI Placement (I) Prof. Lei He Http://eda.ee.ucla.edu Thanks to Chis Chu, Jason Cong, Paul Villarubia and David Pan for contributions to slides

  2. Problem formulation • Input: • Blocks (standard cells and macros) B1, ... , Bn • Shapes and Pin Positions for each block Bi • Nets N1, ... , Nm • Output: • Coordinates (xi , yi ) for block Bi. • The total wire length is minimized. • The area of the resulting block is minimized or given a fixed die • Other consideration: timing, routability, clock, buffering and interaction with physical synthesis

  3. Placement can Make A Difference • MCNC Benchmark circuit e64 (contains 230 4-LUT). Placed to a FPGA. Random Initial Placement Final Placement After Detailed Routing

  4. Importance of Placement • Placement is a fundamental problem for physical design • Glue of the physical synthesis • Becomes very active again in recent years: • 9 new academic placers for WL min. since 2000 • Many other publications to handle timing, routability, etc. • Reasons: • Serious interconnect issues (delay, routability, noise) in deep-submicron design • Placement determines interconnect to the first order • Need placement information even in early design stages (e.g., logic synthesis) • Need to have a good placement solution • Placement problem becomes significantly larger • Cong et al. [ASPDAC-03, ISPD-03, ICCAD-03] point out that existing placers are far from optimal, not scalable, and not stable

  5. Placement Topic in Context • Note that this course is on selected research topics, so the way we cover placement is at pretty high level, with some technical details • More fundamentals about placement will be covered in details, at a core physical design course as CS258F • Or a new core physical design course may be offered next year as EE298 (depending on faculty recruiting)

  6. ISPD-2003 Benchmarking for Large-Scale Placement and Beyond S. N. Adya, M. C. Yildiz, I. L. Markov, P. G. Villarrubia, P. N. Parakh, P. H. Madden

  7. Design Types • ASICs • Lots of fixed I/Os, few macros, millions of standard cells • Placement densities : 40-80% (IBM) • Flat and hierarchical designs • SoCs • Many more macro blocks, cores • Datapaths + control logic • Can have very low placement densities : < 20% • Micro-Processor (P) Random Logic Macros(RLM) • Hierarchical partitions are placement instances (5-30K) • High placement densities : 80%-98% (low whitespace) • Many fixed I/Os, relatively few standard cells • Recall “Partitioning w Terminals” DAC`99, ISPD `99, ASPDAC`00

  8. Requirements for Placers • Must handle 4-10M cells, 1000s macros • 64 bits + near-linear asymptotic complexity • Scalable/compact design database (OpenAccess) • Accept fixed ports/pads/pins + fixed cells • Place macros, esp. with var. aspect ratios • Non-trivial heights and widths(e.g., height=2rows) • Honor targets and limits for net length • Respect floorplan constraints • Handle a wide range of placement densities(from <25% to 100% occupied), ICCAD `02

  9. Standard Cell: Placement Footprints: Data Path: IP - Floorplanning

  10. Core Control IO Placement Footprints: Reserved areas Mixed Data Path & sea of gates:

  11. Placement Footprints: Perimeter IO Area IO

  12. Unconstrained Placement

  13. Floor planned Placement

  14. VLSI Global Placement Examples bad placement good placement

  15. Major Placement Techniques • Simulated Annealing • Timberwolf package [JSSC-85, DAC-86] • Dragon [ICCAD-00] • Partitioning-Based Placement • Capo [DAC-00] • Fengshui [DAC-2001] • Analytical Placement • Gordian [TCAD-91] • Kraftwerk [DAC-98] • FastPlace [ISPD-04] • Hall’s Quadratic Placement • Genetic Algorithm

  16. Outline • Wire length driven placement • Main methods • Simulated Annealing • Gate-Array: Timberwolf package • Standard-Cell: Timberwolf package, Dragon • Partition-based methods • Analytical methods • Timing and congestion consideration • Newer trends

  17. Simulated Annealing Based Placement ( I ) “ The Timberwolf Placement and Routing Package”, Sechen, Sangiovanni; IEEE Journal of Solid-State Circuits, vol SC-20, No. 2(1985) 510-522 “Timber wolf 3.2: A New Standard Cell Placement and Global Routing Package” Sechen, Sangiovanni, 23rd DAC, 1986, 432-439 • Timber wolf • Stage 1 • Modules are moved between different rows as well as within the same row • modules overlaps are allowed • when the temperature is reduced below a certain value, stage 2begins • Stage 2 • Remove overlaps • Annealing process continues, but only interchanges adjacent modules within the same row

  18. overlaps Solution Space • All possible arrangements of modules into rows possibly with overlaps

  19. . . Neighboring Solutions Three types of moves: M1: Displace a module to a new location M2: Interchange two modules M3: Change the orientation of a module Axis of reflections 1 2 2 1 1 2 3 4 3 4 3 4

  20. Move Selection • Timber wolf first try to select a move betwee M1 and M2 • Prob(M1)=4/5 • Prob(M2)=1/5 • If a move of type M1 is chosen ( for certain module) and it is rejected, then a move of type M3 (for the same module) will be chosen with probability 1/10 • Restriction on: • How far a module can be displaced • What pairs of modules can be interchanged M1: Displacement M2: Interchange M3: Reflection

  21. Move Restriction • Range Limiter • At the beginning, R is very large, big enough to contain the whole chip • Window size shrinks slowly as the temperature decreases. In fact, height and width of R  log(T) • Stage 2 begins when window size are so small that no inter-row modules interchanges are possible Rectangular window R

  22. Cost Function net i Y = C1+C2+C3 hi å a w + b C : ( h ) wi 1 i i i i i ai, bi are horizontal and vertical weights, respectively ai =1, bi =1 1/2 •perimeter of bounding box • Critical nets: Increase both ai and bi • Double metal technology: Over-the-cell routing is possible. Fewer feed through cells are needed • vertical wirings are “cheaper” than horizontal wirings . use smaller vertical weights i.e. bi< ai

  23. Cost Function(Cont’d) • C2: Penalty function for module overlaps • O(i,j) = amount of overlaps in the X-dimension • between modules i and j • a — offset parameter to ensure C2  0 when T  0 ( ) å 2 = + a C O ( i , j ) 2 ¹ i j • C3: Penalty function that controls the row lengths • Desired row length = d( r ) • l( r ) = sum of the widths of the modules in row r å = b - C l ( r ) d ( r ) 3 r

  24. Annealing Schedule • Tk = r(k)•T k-1 k= 1, 2, 3, …. • r(k) increase from 0.8 to max value 0.94 and then decrease to 0.1 • At each temperature, a total number of K•n attempts is made • n= number of modules • K= user specified constant

  25. Dragon2000: Standard-Cell Placement Tool for Large Industry Circuits M. Wang, X. Yang, and M. Sarrafzadeh, ICCAD-2000 pages 260-263

  26. Main Idea • Simulated annealing based • 1.9x faster than iTools 1.4.0 (commerical version of TimberWolf) • Comparable wirelength to iTools (i.e., very good) • Performs better for larger circuits • Still very slow compared with than other approaches • Also shown to have good routability • Top-down hierarchical approach • hMetis to recursively quadrisect into 4h bins at level h • Swapping of bins at each level by SA to minimize WL • Terminates when each bin contains < 7 cells • Then swap single cells locally to further minimize WL • Detailed placement is done by greedy algorithm

  27. Outline • Wire length driven placement • Main methods • Simulated Annealing • Gate-Array: Timberwolf package • Standard-Cell: Timberwolf package, Grover, Dragon • Partition-based methods • Analytical methods • Timing and congestion consideration • Newer trends

  28. Partition based methods • Partitioning methods • FM • Multilevel techniques, e.g., hMetis • Two academic open source placement tools • Capo (UCLA/UCSD/Michigan): multilevel FM • Feng-shui (SUNY Binghamton): use hMetis • Pros and cons • Fast • Not stable

  29. Partitioning-based Approach • Try to group closely connected modules together. • Repeatly divide a circuit into subcircuits such that the cut value is minimized. • Also, the placement region is partitioned (by cutlines) accordingly. • Each subcircuit is assigned to one partition of the placement region. Note: Also called min-cut placement approach.

  30. An Example Cutline Circuit Placement

  31. Variations • There are many variations in the partitioning-based approach. They are different in: • The objective function used. • The partitioning algorithm used. • The selection of cutlines.

  32. Objective: Partitioning: Given a set of interconnected blocks, produce two sets that are of equal size, and such that the number of nets connecting the two sets is minimized.

  33. FM Partitioning: Initial Random Placement list_of_sets = entire_chip; while(any_set_has_2_or_more_objects(list_of_sets)) { for_each_set_in(list_of_sets) { partition_it(); } /* each time through this loop the number of */ /* sets in the list doubles. */ } After Cut 1 After Cut 2

  34. Moves are made based on object gain. Object Gain: The amount of change in cut crossings that will occur if an object is moved from its current partition into the other partition FM Partitioning: -1 2 0 - each object is assigned a gain - objects are put into a sorted gain list - the object with the highest gain from the smaller of the two sides is selected and moved. - the moved object is "locked" - gains of "touched" objects are recomputed - gain lists are resorted 0 -1 0 -2 0 0 -2 -1 1 -1 1

  35. FM Partitioning: -1 2 0 0 -1 0 -2 0 0 -2 -1 1 -1 1

  36. -1 -2 -2 0 -1 -2 -2 0 0 -2 -1 1 -1 1

  37. -1 -2 -2 0 -1 -2 -2 0 0 -2 -1 1 1 -1

  38. -1 -2 -2 0 -1 -2 -2 0 0 -2 -1 1 1 -1

  39. -1 -2 -2 0 -1 -2 -2 0 -2 -2 1 -1 -1 -1

  40. -1 -2 -2 -1 -2 0 -2 0 -2 -2 1 -1 -1 -1

  41. -1 -2 -2 -1 -2 -2 0 0 -2 -2 1 -1 -1 -1

  42. -1 -2 -2 1 -2 -2 0 -2 -2 -2 1 -1 -1 -1

  43. -1 -2 -2 1 -2 -2 0 -2 -2 -2 1 -1 -1 -1

  44. -1 -2 -2 1 -2 -2 0 -2 -2 1 -2 -1 -1 -1

  45. -1 -2 -2 1 -2 -2 0 -1 -2 -2 -2 -3 -1 -1

  46. -1 -2 -2 1 -2 -2 0 -1 -2 -2 -2 -3 -1 -1

  47. -1 -2 -2 1 -2 -2 0 -1 -2 -2 -2 -3 -1 -1

  48. -1 -2 -2 -1 -2 -2 -2 -1 -2 -2 -2 -3 -1 -1

  49. Breuer’s Cutline Selection Schemes M.A. Breuer, “Min-Cut Placement”, J. Design Automation and Fault-Tolerant Computing 1(4):343-382, Oct. 1977. M.A. Breuer, “A Class of Min-Cut Placement Algorithms”, DAC 1977, pages 284-290.

  50. # of Nets Across a Cutline • For any cutline c, let v(c) be the total number of nets cut by c. • v(c) gives a lower bound on the number of tracks along cutline c. • Useful in standard-cell or gate-array layout. Cutline c v(c) = 2

More Related