230 likes | 418 Views
Analytical Minimization of Signal Delay in VLSI Placement. Andrew B. Kahng and Igor L. Markov UCSD, Univ. of Michigan http://www.eecs.umich.edu/~imarkov IBM technical contact: Paul Villarrubia. Outline. Background: Global Placement for VLSI wirelength minimization delay minimization
E N D
Analytical Minimization of Signal Delayin VLSI Placement Andrew B. Kahng and Igor L. Markov UCSD, Univ. of Michigan http://www.eecs.umich.edu/~imarkov IBM technical contact: Paul Villarrubia
Outline • Background: Global Placement for VLSI • wirelength minimization • delay minimization • Contribution • minimization objective • “generic” minimization algorithm: outer loop and inner loop • empirical results • Futures
VLSI Global Placement • Find locations for standard cells • Standard cells placed in rows, without overlap • Minimize wirelength, “routing congestion” • Minimize clock cycle • Key abstractions: • standard cells rectangular outlines • netlist weighted hypergraph (signal nets hyperedges) • signal delay function of cell locations (interconnect dominates)
A VLSI Global Placement Example bad placement good placement
Netlist Hypergraph and Timing Graph • Two signal nets: 3 pins (l.blue), and 4 pins (l.green) • Ovals: hyperedges • Red edges: timing graph edges
Top-Down Global Placement • Placement blocks represent cells and layout area • single block at the start,driven by recursive (min-cut) bipartitioning • each pass: number of blocks doubles, size of blocks halves • end case: several cells in a tiny region etc. • Intuition: many cells can operate in parallel. • Partitioning finds “independent” groups of cells
Analytical Global Placement • Find a continuous placement (locations == reals) • Efficient optimizations when nonconvex constraints are relaxed (e.g., cells are allowed to overlap) • Represent multi-pin hyperedges by sets of edges • minimize total weighted “wirelength”of all edges Popular objectives: • Linear (Manhattan) WL = w12 ( |x1-x2| + |y1-y2| ) • Quadratic “squared” WL = w12 ( (x1-x2)2 + (y1-y2)2 ) Constraints: fixed vertices and/or “region constraints” P1 P2
Analytical Placement Alone is Not Enough • Many cells overlap • Must “spread” the placement • IBM CPlace and XQ • Remove overlap (comp. geometry) • Cplace combines min-cutwith analytical techniques
Timing-Driven Placement • Cycle time maximum path delay, not total path delay (!) • max(x,y,...) is not differentiable • framework: pin-based timing graph • Analytical approaches allow cell overlaps • Cell overlaps are resolved later • Main difficulty: cannot enumerate signal paths • Signal paths implicitly defined by device types • signal path sources, sinks == I/O pins and storage elements • Timing constraints also implicitly defined • “actual arrival times” (AATs) at sources • “required arrival times” (RATs) at sinks • source-sink path constraint:path delay RAT@sink - AAT@source
Implicit Analysis of Path Constraints • Static Timing Analysis (STA) methodology • forward topological traversal in timing graph AAT@every_pin • similar backward traversal RAT@every_pin • slack@pinis given by RAT@pin - AAT@pin • negative slacks violated timing constraints • STA-based and STA-inspired placement methods • slacks net weights for HPWL minimization • top-down placement to maximize negative slack (Marek-Sadowska/Lin 86) • note: STA requires edge delays (e.g., from placement) • delay budgets • zero-slack (Hauge, Nair and Yoffa 86) • iterative min-max (Shragowitz et al. 90/92) • limit-bumping (Frankle 92)
Motivations For Novelty • Many promising techniques available • net reweighting • delay budgeting • others • Existing frameworks have weaknesses • speed/scalability • loss or ignorance of input information • delay budgeting algorithms tend to ignore fixed locations, obstacles • optimization of “wrong” global objectives (e.g., average wirelength)
The Dimensionless Path-Timing Objective • For path consider edge e • Dimensionless Path-Timing Objective (DPO) =max {t /c}= max {(ede)/c} • Where • c is path constraint • t is path delay • de= dij(xi,yi,xj,yj) is edge delay
DPO: Properties =max {t /c}= max {(ede)/c} • 1 all timing constraints are satisfied • Convex when edge delay models are convex • Min DPO max slackwhen allcare equal • Max slack can be reduced to min DPO • add two new vertices: the source and the sink • connect the source to former sources • connect the sink to former sinks • use constant edge delay models
Criticalities: “Multiplicative Slacks” • By analogy with slack, define criticalities i= max v{t /c} for vertex v=vi ij= max e{t /c} for edge e=eij • Criticalities are multiplicative versions of slack • DPO and criticalities quickly computable • STA + postprocessing • Vertex criticalities cells on critical paths • can be used by the proposed top-down timing-driven placement flow
Generic Minimization of DPO • Reduce DPO to a simpler objective: maxijwijdij • maximal weighted edge delay • use “reweighting iterations” • One reweighting iteration • assume a placement • compute edge criticalities • compute new edge weights wij • minimize maxijwijdij • (New weights: wij’= ij / dijwhere = maxijwijdij )
Properties of Reweighting • Theorem 1. If = maxijwijdijdoes not increase at a particular iteration, all timing constraints must be satisfied. • Theorem 2. A re-weighting iteration either decreases DPO, or leaves it unchanged. • Reweighting upper-bounds dij because wijdij • can interpret reweighting as delay rebudgeting • Youssef and Shragowitz used wij= ij in 1990/92 • [interpretation of their iterative MiniMax] • no iterations with placement: ignore fixed pad locations
Optimization of Maximal Edge Delay • Must consider particular edge delay models • popular choices: linear and quadratic • Theorem 3. 2-dim max edge delay can be reduced to 1-dim case with double #vertices • [“Inlined” implementation: no new graph] max akm |tk-tm| max bkm (tk-tm)2 • Theorem 4. Let bkm=akm2 minimizers coincide • Linear and quadratic WL are numerically equivalent!
Top-Down Placement Framework • Top-down placement done in passes • In one pass • split every previously existing block • Cell-to-block assignments • viewed as region constraints • gradually refine, converge to cell locs • Assume we analytically minimized signal delay • have cell locations can compute edge delays • can perform Static Timing Analysis • know which cells lie on critical paths • Use delay-minimizing cell locs when splitting blocks
Empirical Validation • We combined min-max placement with recursive min-cut bisection (Capo CapoT) • Implemented minimization of edge delay objectives: • Length as delay • Squared length as delay • Quadratic RC delay • MST-based Elmore delay (using • Evaluated • Internal evaluators (after placement): sanity check • Industry timing analyzer • Compared to an industry placer on 4 test-cases • Won on three test-cases (by slack computed with industry STA)
Conclusions and Ongoing Work • New timing-driven placement framework • can potentially be combined with budgeting or reweighting • expected to be successful enough on its own • leverages mincut placement • relies on a novel analytical delay minimization • Dimensionless Path-timing Objective (DPO) • novel global timing objective; generalizes slack optimization • New minimization algorithms • reweighting iteration: reduction to simpler MAX-based objective • MAX-based objective can be minimized very quickly • Ongoing work in the context of timing-driven flows
Future Work • Observation (how the proposed method works) • a classic placement approach is split into stages • a new timing optimization is performed between those stages • most critical wires/gates are found first (traditionally: placement is found first) • Try other types of optimizations during placement • routing of timing-critical nets • better delay estimation • early cross-talk detection? • sizing of timing-critical drivers • buffer insertion for timing-critical nets • early detection of dangerous cross-talk • Faster and cheaper ICs