420 likes | 502 Views
Topics in “Middle-Down” Placement. John Lillis Karthik Kalpat Devang Jariwala Sung-Woo Hur UIC. Background/Overview. Current Target: Wire-Length Driven Cell Placement Origins: Mongrel (ICCAD 2000) Key Concepts “Middle-Down” flow Relaxation Based Local Search
E N D
Topics in “Middle-Down” Placement John Lillis Karthik Kalpat Devang Jariwala Sung-Woo Hur UIC
Background/Overview • Current Target: Wire-Length Driven Cell Placement • Origins: Mongrel (ICCAD 2000) • Key Concepts • “Middle-Down” flow • Relaxation Based Local Search • Novel Legalization Procedures • Detailed Optimization Techniques: Interleaving • Extensions (works in progress) • Multi-Level Strategies (both in target grid and source netlist) • Placement-Based Strategies: Re-clustering, tie-breaking • Two Row Interleaving
Themes • Constraint Relaxation (flow/qplace for global placement) • Constraint Imposition (e.g., interleaving) • Search • Clustering • Scalability • Escape From Local Optima • Inducing Uniformity
Wire-Length Driven Placement • Given: netlist of a circuit, # rows • Placement Problem: |E| minlen(ei) over all legal placement P i = 1
Detailed Placement Optimal Interleaving... Global Placement Relaxation-Based Local Search Mongrel Middle-Down Flow (2-level)
Why Middle-Down? • Better correlation with final result • min-cut bets a lot on 1st cut • other objectives (e.g., timing) are meaningless in strict top-down • Simplification of constraints (vs. flat) • Enables “Search” More than most Analytical Methods • Moves (later) More Global Than Traditional Search
Key Concepts In Mongrel • Global Placement • RBLS: Relaxation Based Local Search (Network flow based) • Max-Gain Based Legalization • Detailed Placement • Dynamic Clustering • Optimal Interleaving
Global Placement • Given n and m, place cells on an n X m grid Total cell area • Bin capacity C = (n x m) • Bin Capacity Constraint:Each bin B(i,j) has lower and upper bound to allow cells to be placed (1 - ) C Sv (1 + ) C vB(I,j) • Legal Global Placement: Given , every bin satisfies bin capacity constraint and row size constraint • Row Size Constraint: Every row size should be within specified tolerance
Philosophy of Constraint Relaxation - Relax constraints (e.g., allow cell overlap) - Solve relaxed formulation optimally (and efficiently) - Resolve infeasibility (cell overlap) heuristically
k c P initial placement k k - 1 while k > 0 P New placement Improved k c Overview of Relaxation Based Local Search Input : M and c Extract set of M mobile nodes Find optimal relaxed location for mobile nodes P Legalized placement Local optimization using partitioning technique Yes No
c c c a a b b b e e e g g g d d d h h f,h a,f f e e e e e e e e e e e e e e e e e e e e e 1 3 4 7 4 6 6 7 7 6 4 2 2 3 3 1 2 1 5 5 5 Projected sub-circuits Sub-circuit Concept
Relaxed Placement Solvers • Linear Program --> Network Flow Solution • Measures Half-Perimeter • Combinatorial Algorithm • Quadratic Placement • Can Only Appx. Half-Perimeter • Quadratic Objective: • Measures “wrong” thing • Can give more favorable cell distribution • Current Implementation Faster than Flow • Preliminary Data: Qplace better for early in optimization (large sub-ckt size); Flow better for small sub-ckt size.
Philosophy of Optimal Interleaving • Target: Intra-row optimization • Desirable Properties: 1. Efficiency 2. Large solution space
A1 A1 B1 B1 B1 A2 A2 B2 B2 B2 B3 B3 B3 A3 A3 B4 B4 B4 Optimal Interleaving Window A1 A2 A3 Partitioning Interleaving
Two Row Interleaving • Idea: Enable More Flexibility by Considering Adjacent Rows Simultaneously. • Cells Constrained to be in Original Rows, but Enables Cells to “Move Together” (less influence of “anchors” other rows. • Formulation: • A1, B1: Subsequences of row 1 (relative order obeyed) • A2, B2: Subsequences of row 2 (relative order obeyed) • Simultaneously find optimal interleavings of (A1,B1) and (A2,B2) s.t. WL is min. • Note: only x-wire length changes.
Two Row Interleaving – Uniform Case Window a R1 R2 b • - All cells identical width (e.g., FPGAs) • Cells (e.g., a, b) can move together • Solution: Dynamic Programming
DP Idea for Uniform Case • For Separability, “scan line” must move L-to-R • Decomposition: • k: length of prefix in # cells • i1 <= k: index into subseq A1 • [ j1 = k - i1 : implicit index into B1 ] • i2 <= k: index into subseq A2 • [ j2 = k – i2: implicit index into B2 ] • S[k,i1,i2] = WL of opt. Interleaving of: • A1[1..i1], B1[1..j1] and • A2[1..i2], B2[1..j2]
Four ways to form S[k,i1,i2] k-1 { A1[i1] S[k-1,i1-1,i2-1] A2[i2] { A1[i1] S[k-1,i1-1,i2] B2[j2] cells { B1[j1] S[k-1,i1,i2-1] A2[i2] { B1[j1] S[k-1,i1,i2] B2[j2]
Uniform Case Discussion • Fill in table using recurrence • Take best of four cases by incremental wire-length calculation • Because parameterized by k, we ensure separability required for DP • Run-Time: O(N^3) for constant degree (more complex expression if pins considered) • Next: how about variable width standard cells?
Standard Cell Case 80 cells 20 cells Issues: - Decomposition by k not generally possible - Need to deal with “jaggedness” of right boundary
Dealing with “Jaggedness” • WLOG: assume left boundary identical for both rows (minor modifications can accommodate general case). • Let w(R,i,j) be the total cell width of cells A[1..i] and B[1..j] in row R (for given window). • Feasibility requirement for quadruple (i1,j1,i2,j2): • | w(R1,i1,j1) – w(R2,i2,j2) | < D • Idea: allow a certain degree of noise with “near separability”). • Nevertheless, near-uniformity desirable (large D implies large error; wide-variance: few or no feasible configurations).
Clustering for Uniformity • Clustering is performed within the chosen windows for uniformity. • A method similar to the dynamic clustering method that finds the cluster edges at points of minimum density is used for this purpose.
Graph Interpretation • [ i1 j1 i2 j2] 1001 0001 0110 0010 0000 n1 m1 n2 m2 0100 0101 1000 1010 Notes: Eight candidate edges; only feasible nodes considered.
Discussion • Feasible space much smaller than n^4. • Use hash table for this sparseness • Practical runtime: O(n^3) • With clusters, a generalization is employed: cluster flipping (pin locations preserved).
Routed WL for FPGAs VPR+2ROW 98.2 96.2 98.5 95.5 99.1 99.9 VPR 100 100 100 100 100 100 CKT ALU4 APEX2 APEX4 BIGKEY CLMA DES VPR+1ROW 99.7 99.6 99.4 99.6 99.3 99.9
Idea: use a cell’s force vector to guide decisions Primary Application: Tie-Breaking Current Applications: Interleaving Max-Gain Path (Dynamic Re-clustering) Details: Weighted Graph Model used (as in Qplace) Weighted Center of Gravity of Neighbors Calculated Notion of Force Vector
Convert to detailed placement Perform interleaving & output final result Middle-Down With Clustering Cluster the net list & place Global Placement on “coarse” grids De-cluster this circuit, increase the grid size & place NO Is the circuit completely flattened? YES
Clustering Cycle Unclustering Clustering Placement of clusters
Grid Schedule • Mongrel: 2-level • Newer Work: Multi-Level • One Data Point (ckt: IBM11): • A: 25x25 -> 40x40 -> 60x40 -> 128x40 • B: 3x3 ->10x10 -> 20x20 -> 40x40 -> 128x40 • Schedule B consistently better
Bottom-up clustering • Pair-wise merging • Selection of Node1 • Criteria: size • Selection of Node2 • Criteria: bandwidth • Note: Target Size is always maintained
Reclustering (Work In Progress) • Idea: Let Placer Help in Finding Candidate Pairs to Cluster • Run Global Placer at Certain Granularity • Partially Flatten • Run Global Placer on Flattened Netlist • Cells (clusters) in same bin now are good candidates to join. • Additionally: Use “force vector” as similarity metric.
1 2 3 4 5 6 7 8 9 a b c d e f g h i Displacement Graph • Regarding displacement graph, placement means linear placement. • P[ i ]: cell placed in location i • P-1[ v ]: location of cell v • Assume each cell has a unit size for linear placement Location: Placement: P[4] = d P-1[g] = 7 E.g.:
1 2 3 4 5 6 7 8 9 a a b b c c d d e e f f g g h h i i 2 3 3 3 -5 -5 1 -4 2 Displacement Graph (cont.) • Suppose Pg is a relatively good reference placement and Pb a mediocre one. • Displacement of Pb for each location i w.r.t. Pg is D[ i ] = Pg-1[Pb[ i ]] - i • An Example of Displacement location : Pg: Pb: D[i] :
Displacement Graph (cont.) 6000 "plateaus" displacement 4000 2000 0 -2000 14050 14125 14200 location • Test circuit: s38584 • length(reference placement) = 1247152 • length(mediocre placement) = 2698928
120 290 Plateaus-Density Correlation displacement density (of mediocre placement) Test circuit: s38584 location
600000 Wire length 550000 500000 500000 450000 200 700 1200 Time(sec) Dynamic Clustering (Linear Placement) • Test circuit: s15850 • : Flat placement • : Clustered placement
A Few Data Points • MCNC avql (1 cell-height row spacing): • Middle-Down: 5.86 • Capo: 6.207 • Feng Shui: 6.291 • DRAGON: 5.25 • IBM11 • Middle-Down: 4.21 • DRAGON: 4.08
Flexibility and Potential of Framework Timing Optimization? Modularity of Optimizers Strong results for small to medium sized circuits Near competitive results for larger circuits Scalability Unclear Speed Ongoing Work: More Careful Study of Initial Clustering Speedup Techniques (esp. gain graph) Implementation of 2-row interleaving Dynamic Re-clustering Strengths and Weaknesses