310 likes | 500 Views
An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement. Andrew B. Kahng and Qinke Wang UCSD CSE Department {abk, qiwang}@cs.ucsd.edu Work partially supported by the MARCO Gigascale Systems Research Center, NSF MIP-9987678 and the Semiconductor Research Corporation.
E N D
An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement Andrew B. Kahng and Qinke Wang UCSD CSE Department {abk, qiwang}@cs.ucsd.edu Work partially supported by the MARCO Gigascale Systems Research Center, NSF MIP-9987678 and the Semiconductor Research Corporation.
Motivation • Mixed-size placement • design productivity increasingly requires IP reuse • processing / interface cores, embedded memories, etc. • “boulders and dust” challenge:sizes of placeable objects can vary by factors of 10,000 or more • placement is particularly complex in fixed-die context • Timing-driven placement • more critical with device and interconnect scaling
Our Work • APlace[Kahng/Wang ISPD04]: an analytic placer for wirelength-driven standard-cell placement • [Naylor et al., US Patent 6301693, 2001] • superior wirelength quality compared to Cadence QPlace, Dragon and Capo • strong extensibility: congestion-directed placement, I/O-core co-placement, constraint handling for mixed-signal, etc. • poor scalability: average 13.2 X slower than Capo • This work: extend APlace to address mixed-size placement and timing-driven placement
Outline • APlace Background • Extension to Mixed-Size Placement • Extension to Timing-Driven Placement • Conclusions and Ongoing Work
Outline • APlace Background • Formulations • wirelength minimization • cell spreading = density control • Implementation • Extension to Mixed-Size Placement • Extension to Timing-Driven Placement • Conclusion and Ongoing Work
Wirelength Formulation • Placement objective: HPWL • Smooth approximation Naylor et al., US Patent 6301693, 2001 • log-sum-exp formula: pick the most dominant terms among pin coordinates • : smoothing parameter • closer to HPWL when α→ 0 • precise • strictly convex • continuously differentiable
Density Control • Common strategy • divide the placement area into grids • equalize the total cell area in each grid • Penalty of an uneven cell distribution • not smooth or differentiable • difficult to optimize
p(d) 2 2 1-2d /r 2 2 2(r-d) /r d r r/2 r/2 r Cell Potential Function • Bell-shaped cell potential function [Naylor et al., US Patent 6301693, 2001] • Cell c has potential(c, g) with respect to grid g • Cell c at (x, y) has area A • Grid point g = (x', y') • p(d) : bell-shaped function • r : the radius of cells' potential • C : a proportionality factor, s.t.
Implementation • Cells are spread by minimizing the smooth density penalty function • APlace combines the above two objectives and optimizes the following function using a Conjugate Gradient optimizer: • Density term drives cell spreading • Wirelength term draws connected components back toward each other
Wirelength vs. Density Objectives • Density weight: fixed • larger spread cells out hastily without good wirelength • Wirelength weight: variable • larger contract cells together and prevent them from spreading out • initially set to be large • repeat until all cells are spread out evenly: • execute conjugate-gradient solver until convergence • reduce the weight by half Objective:
Outline • APlace Background • Extension to Mixed-Size Placement • Density control for macros • Legalization • Experimental results • Extension to Timing-Driven Placement • Conclusion and Ongoing Work
Previous Works • Capo flow: a three stage placement-floorplanning-placement flow that uses Capo [Adya et al., ISPD02, ICCAD03] • mPG-MS: a simulated annealing based multi-level placer[Chang et al., ASPDAC03] • Feng Shui: a recursive bisection based placement tool using fractional cuts[Khatkhate et al., ISPD04]
Potential Function for Macros (I) • Each module has a potential or influence with respect to nearby grids • APlace seeks to equalize the total module potential at each grid • rm is the radius of module’s potential • Standard-cell placement: rm is a constant r • Mixed-size placement: rm changes according to the module's dimension • A larger block will have potential with respect to more nearby grids
p(d) 2 1-a*d 2 b*(r-d) d w/2+r w/2+r/2 w/2+r/2 w/2+r Potential Function for Macros (II) • p(d) : potential function d : distance from module to grid • Radius rm = w/2 + r for a block with width w • Convex curved < w/2 + r/2 • Concave curvew/2 + r/2 < d < w/2+ r • p(d) is smooth atd = w/2 + r/2
Legalization • Simplified Tetris algorithm[Hill, US Patent 6370673, 2002] • sort modules based on a linear combination of vertical coordinate and width • search the current nearest available position for each module • Pros and cons • fast • larger blocks are fixed at a position ahead of nearby small cells • best applied when modules are distributed evenly • may fail if the global placement has many overlaps among macros
circuit APlace-MS detailed placement WL WL_l inc. (%) CPU WL_dp impr. (%) CPU ibm01 0.20 0.24 18.5 15 0.23 5.7 1 ibm02 0.51 0.52 0.7 45 0.50 2.5 3 ibm03 0.70 0.74 6.2 56 0.72 3.5 3 ibm04 0.81 0.85 4.8 48 0.83 2.8 4 ibm05 1.01 1.00 -0.5 15 0.98 2.0 5 ibm06 0.65 0.71 9.6 76 0.68 4.4 5 ibm07 1.03 1.09 5.8 98 1.05 3.7 8 ibm08 1.49 1.50 0.6 128 1.46 2.7 8 ibm09 1.25 1.45 15.7 113 1.38 5.2 9 ibm10 2.97 3.07 3.3 206 3.00 2.2 11 APlace-MS Results • Ten ISPD02 Mixed-Size Benchmarks (10K-70K cells) • Average wirelength increase after legalization: 6.5% Detailed placement by Feng Shui: 3.5% avg. WL improvement
HPWL Comparison • Capo flow[ICCAD03] 26.0% (11.5% ~ 34.0%) • mPG-MS [ASPDAC03]24.7% (9.9% ~ 40.1%) • Feng Shui [ISPD04] 4.0% (-7.3% ~ 20.0%) • Runtime • Xeon server (2.4GHz CPU, double-threaded) • much slower than Feng Shui
Outline • APlace Background • Extension to Mixed-Size Placement • Extension to Timing-Driven Placement • Slack-derived edge weights • Timing-driven placement flow • Experimental results • Conclusion and Ongoing Work
Timing-Driven Approaches • Path based methods • consider all or a subset of paths directly • maintain an accurate timing view during optimization • complexity is relatively high • Net based methods • transform timing constraints or requirements into either net weight or net length (or delay) constraints
Net Based Methods • Delay budgeting • distribute slacks from the end-points to constituent nets along the path • may severely over-constrain the problem without consideration of physical feasibility • Net weighting • assign weights to nets based on timing criticality • low complexity, strong flexibility and easy implementation • more attractive as circuit sizes increase and timing constraints become more complex
Slack-Derived Edge Weights • Net weighting in TD-APlace • β: timing criticality exponent • slack(π) : the slack of path π • T : longest path delay • Heavy net weights are assigned to: • timing critical nets exponential function [Marquardt et al. 2000] • nets included in many critical paths [Kong ICCAD02]
Timing-Driven Placement Flow • Final placement stage • TrialRoute (SoC Encounter v3.2): a fast global and detailed routing • Extract RC • Pearl (SE v5.4): static timing analysis (STA) • Import critical path delays to decide net weights • Minimize weighted WL objective
Timing Results: Indust1 Testcase • Indust1: ~ 7k cells • Xeon 2.4GHz CPU, double-threaded • Minimum cycle time • measures quality of TD placements • initially decreases with criticality exponent • gradually deteriorates as criticality exponent continues to increase Results with varying criticality exponents (β)
Comparison vs. Industry Placers (I) • Two industry placers • QPlace (SE v5.4) • amoebaPlace (SoC Encounter v3.2) • Six industry circuits • 7k ~ 40k cells • two from the ISPD 2001 Circuit Benchmarks • Experimental flow • TD or non-TD placements • WarpRoute (SoC Encounter v3.2) : timing-driven routing • Extract RC • Pearl (SE v5.4): static timing analysis (STA)
Comparison vs. Industry Placers (II) • Comparison to TD-QPlace and TD-amoebaPlace • Final HPWL • TD-QPlace: 7.2%(-1.2% ~ 7.1%) • TD-amoebaPlace: 6.5%(-11.1% ~ 23.2%) • Min Cycle • TD-QPlace: 9.6%(-1.2% ~ 14.8%) • TD-amoebaPlace: 8.5%(-0.8% ~ 28.5%) • APlace: 2%(0.1% ~ 3.8%)
Conclusions • APlace analytic placement framework extended to address mixed-size and timing-driven placement • Mixed-size placement • HPWL outperforms mPG-MS, Feng Shui and the Capo flow respectively by 24.7%, 4.0% and 26.0% on average • Timing-driven placement • Minimum cycle time outperforms that of TD-QPlace and TD-amoebaPlace respectively by 9.6% and 8.5% • Routed WL outperforms that of TD-QPlace and TD-amoebaPlace respectively by 7.2% and 6.5%
Ongoing Work • Scalability issue • APlace currently does not scale to large instances • control scheme for larger circuits • Augmented Lagrangian method for constrained nonlinear optimization • multigrid algorithm • Extension to low power or IR drop directed placement • Extension to 3D or thermal-aware placement
Acknowledgments • We thank Brent Gregory, Will Naylor and Synopsys, Inc. for a research and educational license pertaining to U.S. Patents 6282693, 6662348, 6301693, 6671859 and 6665851.
HPWL Results Comparison • Comparison (HPWL) • the Capo flow[ICCAD03] 26.0% (11.5% ~ 34.0%) • mPG-MS [ASPDAC03]24.7% (9.9% ~ 40.1%) • Feng Shui [ISPD04] 4.0% (-7.3% ~ 20.0%) • Comparison (Running Time) • Xeon server (2.4GHz CPU, double-threaded) • much slower than Feng Shui Comparison of our results with the Capo flow, mPG-MS and Feng Shui