380 likes | 683 Views
Algorithmic Techniques in VLSI CAD. Shantanu Dutt University of Illinois at Chicago. Algorithms in VLSI CAD. Divide & Conquer (D&C) [e.g., merge-sort, partition-driven placement] Reduce & Conquer (R&C) [e.g., multilevel techniques such as the hMetis partitioner]
E N D
Algorithmic Techniques in VLSI CAD Shantanu Dutt University of Illinois at Chicago
Algorithms in VLSI CAD • Divide & Conquer (D&C) [e.g., merge-sort, partition-driven placement] • Reduce & Conquer (R&C) [e.g., multilevel techniques such as the hMetis partitioner] • Dynamic programming [e.g., matrix multiplication, optimal buffer insertion] • Mathematical programming: linear, quadratic, 0/1 integer programming [e.g., floorplanning, global placement]
Algorithms in VLSI CAD (contd) • Search Methods: • Depth-first search (DFS): mainly used to find any solution when cost is not an issue [e.g., FPGA detailed routing---cost generally determined at the global routing phase] • Breadth-first search (BFS): mainly used to find a soln at min. distance from root of search tree [e.g., maze routing when cost = dist. from root] • Best-first search (BeFS): used to find optimal solutions w/ any cost function, Can be done when a provable lower-bound of the cost can be determined for each branching choice from the “current partial soln node” [e.g., TSP, global routing] • Iterative Improvement: deterministic, stochastic
Do recursively until subprob-size is s.t. TT-based design is doable Stitch-up of solns to A1 and A2 to form the complete soln to A Root problem A Subprob. A1 Subprob. A2 A2,2 A1,1 A1,2 A2,1 Divide & Conquer • Determine if the problem can be solved in a hierarchical or divide-&-conquer (D&C) manner: • D&C approach: See if the problem can be “broken up” into 2 or more smaller subproblems that can be “stitched-up” to give a soln. to the parent prob. • Do this recrusively for each large subprob until subprobs are small enough for an “easy” solution technique (could be exhasutive!) • If the subprobs are of a similar kind to the root prob then the breakup and stitching will also be similar
Reduce problem size (Coarsening) Solve Uncoarsen and refine solution Reduce-&-Conquer • Examples: Multilevel graph/hypergraph partitioning (e.g., hMetis), multilevel routing
Root Problem A Stitch-up function f: Optimal soln of root = f(optimal solns of subproblems) = f(opt(A1), opt(A2), opt(A3), opt(A4) Stitch-up function A1 A2 A3 A4 Dynamic Programming (DP) Subproblems • The above property means that everytime we optimally solve the subproblem, we can store/record the soln and reuse it everytime it is part of the formulation of a higher-level problem
Root Problem A Stitch-up function A1 A2 A3 A4 Subproblems Dynamic Programming (contd) • Matrix multiplication example: Most computationally efficient way to perform the series of matrix mults: M = M1 x M2 x ………….. x Mn, Mi is of size ri x ci w/ ri = ci-1 for i > 1. • DP formulation: opt_seq(M) = (by defn) opt_seq(M(1,n)) • = mini=1 to n-1 {opt_seq(M(1, i)) + opt_seq(M(i+1, n)) + r1xcixcn} • Correctness rests on the property that the optimal way of multiplying M1x … x Mi • & Mi+1 to Mn will be used in the “min” stitch-up function to determine the optimal soln for M • Thus if the optimal soln invloves a “cut” at Mr, then the opt_seq(M(1,r)) & opt_seq(M(r+1,n)) will be part of opt_seq(M) • Perform computation bottom-up (smallest sequences first) • Complexity: Note that each subseq M(j, k) will appear in the above computation and is solved exactly once (irrespective of how many times it appears). • Time to solve M(j, k), j < n, k >= j, not counting the time to solve its subproblems (which are accounted for in the complexity of each M(j,k)) is length l of seq -1 = l-1 (min of l-1 different options is computed). Note l = j-k+1 • # of different M(j, k)’s is of length l = n – l + 1, 2 <= l <= n. • Total complexity = Sum i = 1 to n-1 (i+1) (n-i) = O(n 3) (as opposed to, say, O(2 n) using exhaustive search)
A DP Example: Simple Buffer Insertion Problem Given: Source and sink locations, sink capacitances and RATs, a buffer type, source delay rules, unit wire resistance and capacitance RAT4 Buffer RAT3 s0 RAT2 RAT1 Courtesy: Chuck Alpert, IBM
RAT4 RAT3 s0 RAT2 RAT1 Simple Buffer Insertion Problem (contd) Find: Buffer locations and a routing tree such that slack/RAT at the source is maximized Courtesy: Chuck Alpert, IBM
Slack/RAT Example RAT = 500 delay = 400 Slack/RAT = -200 RAT = 400 delay = 600 RAT = 500 delay = 350 Slack/RAT = +100 RAT = 400 delay = 300 Courtesy: Chuck Alpert, IBM
R1 R2 A B C C1 C2 Elmore Delay Courtesy: Chuck Alpert, IBM
DP Example: Van Ginneken Buffer Insertion Algorithm [ISCAS’90] • Associate each leaf node/sink with two metrics (Ct, Tt) • Downstream loading capacitance (Ct) and RAT (Tt) • DP-based alg propagates potential solutions bottom-up[Van Ginneken, 90] • Add a wire • Add a buffer • Merge two solutions: For each Zn=(Cn,Tn), Zm=(Cm,Tm) soln. vectors in the 2 subtrees, create a soln vector Zt=(Ct,Tt) where Note: Take Ln = Cn Cw, Rw Ct, Tt Cn, Tn Cn, Tn Ct, Tt Ct, Tt Cn, Tn Cm, Tm Courtesy: UCLA
RAT4 RAT3 s0 RAT2 RAT1 DP Example (contd) • Add a wire to each merged solution Zt (same cap. & delay change formulation as before) • Add a buffer to each Zt • Delete all dominated solutions Zd: Zd=(Cd, Td) is dominated if there exists a Zr=(Cr, Tr) s.t. Cd >= Cr and Td <= Tr (i.e., both metrics are worse) • The remaining soln vectors are all “optimal” solns for this subtree and one of them will be part of the optimal solution at the root/driver of the net---this is the DP feature of this algorithm
Van Ginneken Example (20,400) Buffer C=5, d=30 Wire C=10,d=150 (30,250) (5, 220) (20,400) Buffer C=5, d=50 C=5, d=30 Wire C=15,d=200 C=15,d=120 (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (20,400) Courtesy: Chuck Alpert, IBM
Van Ginneken Example Cont’d (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (20,400) (5,0) is inferior to (5,70). (45,50) is inferior to (20,100) Wire C=10 (30,250) (5, 220) (20,100) (5, 70) (30,10) (15, -10) (20,400) Pick solution with largest slack, follow arrows to get solution Courtesy: Chuck Alpert, IBM
Mathematical Programming Others Linear programming (LP) E.g., Obj: Min 2x1-x2+x3 w/ constraints x1+x2 <= a, x1-x3 <= b -- solvable in polynomial time Quadratic programming (QP) E.g., Min. x12 – x2x3 w/ linear constraints -- solvable in polynomial (cubic) time w/ equality constraints Some vars are integers Mixed integer linear prog (ILP) -- NP-hard Mixed integer quad. prog (IQP) -- NP-hard Some vars are in {0,1} Mixed 0/1 integer quad. prog (0/1 IQP) -- NP-hard Mixed 0/1 integer linear prog (0/1 ILP) -- NP-hard
0/1 ILP/QLP Examples • Generally useful for “assignment” problems, where objects {O1, ..., On) are assigned to bins {B1, ..., Bm} • 0/1 variable xi,j = 1 of object Oi is assigned to bin Bj • Min-cut bi-partitioning for graphs G(V,E) can me modeled as a 0/1 IQP • xi,1 = 1 => ui in V1 else ui in V2 • Edge (ui, uj) in cutset if xi,1 (1-xj,1) + (1-xi,1)(xj,1 ) = 1 Objective function: Min Sum (ui, uj) in E c(i,j) (xi,1 (1-xj,1) + (1-xi,1)(xj,1) • Constraint: Sum w(ui) xi,1 <= max-size ui uj V2 V1
3 1 1 A A A 5 B B B C C C 4 6 6 2 2 E E E G G G 3 7 D D D 4 F F F 5 Graph DFS BFS Search Techniques soln_dfs(v) /* used when nodes are basic elts of the problem and not partial soln nodes */ v.mark = 1; If path to v is a soln, then return(1); for each (v,u) in E if (u.mark != 1) then soln_found = soln_dfs(u) if (soln_found = 1) then return(soln_found) end for; v.mark = 0; /* can visit v again to form another soln on a different path */ return(0) dfs(v) /* for basic graph visit or for soln finding when nodes are partial solns */ v.mark = 1; for each (v,u) in E if (u.mark != 1) then dfs(u) Algorithm Depth_First_Search for each v in V v.mark = 0; for each v in V if v.mark = 0 then if G has partial soln nodes then dfs(v); else soln_dfs(v);
1 A B 6 C 2 3 E G 4 D 5 F DFS Search Techniques—Exhaustive DFS optimal_soln_dfs(v) /* used when nodes are basic elts of the problem and not partial soln nodes */ begin v.mark = 1; If path to v is a soln, then begin if cost < best_cost then begin best_soln=soln; best_cost=cost; endif v.mark=0; return; Endif for each (v,u) in E if (u.mark != 1) then optimal_soln_dfs(u) end for; v.mark = 0; /* can visit v again to form another soln on a different path */ end Algorithm Depth_First_Search for each v in V v.mark = 0; best_cost = infinity; optimal_soln_dfs(root);
10 costs (1) 12 15 19 (2) 16 18 18 17 (3) Best-First Search BeFS (root) begin open = {root} /* open is list of gen. but not expanded nodes---partial solns */ best_soln_cost = infinity; while open != nullset do begin curr = first(open); if curr is a soln then return(curr) /* curr is an optimal soln */ else children = Expand_&_est_cost(curr); /* generate all children of curr & estimate their costs---cost(u) should be a lower bound of cost of the best soln reachable from u */ for each child in children do begin if child is a soln then delete all nodes w in open s.t. cost(w) >= cost(child); endif store child in open in increasing order of cost; endfor endwhile end /* BFS */ Expand_&_est_cost(Y) begin children = nullset; for each basic elt x of problem “reachable” from Y & can be part of current partial soln. Y do begin if x not in u and if feasible child = Y U {x}; path_cost(child) = path_cost(Y) + cost(u, x) /* cost(Y,x) is cost of reaching x from Y */ est(child) = lower bound cost of best soln reachable from child; cost(child) = path_cost(child) + est(child); children = children U {child}; endfor end /* Expand_&_est_cost(Y);
10 costs (1) 12 15 19 (2) 16 18 18 17 (3) Best-First Search • Proof of optimality when cost is a LB • The current set of nodes in “open” represents a complete front of generated nodes, i.e., the rest of the nodes in the search space are descendants of “open” • Assuming the basic cost (cost of adding an elt in a partial soln to contruct another partial soln that is closer to the soln) is non-negative, the cost is monotonic, i.e., cost of child >= cost of parent • If first node curr in “open” is a soln, then cost(curr) <= cost(w) for each w in “open” • Cost of any node in the search space not in “open” and not yet generated is >= cost of its ancestor in “open” and thus >= cost(curr). Thus curr is the optimal (min-cost) soln
9 A B 5 4 3 5 8 5 F C E 7 1 2 D Search techs for a TSP example A A E B F C F D F D F E E TSP graph E F E D x A A A 27 31 33 Solution nodes Exhaustive search using DFS (w/ backtrack) for finding an optimal solution
9 A B 5 4 3 5 8 5 F C E 7 1 C E 2 D Search techs for a TSP example (contd) A A E B F 5+15 D C F 8+16 F 21+6 D F C F C D E 11+14 22+9 Path cost for (A,E,F) = 8 D B F F E 14+9 23+8 X X X MST for node (A, E, F); = MST{F,A,B,C,D}; cost=16 F F • Lower-bound cost estimate: • MST({unvisited cities} U • {current city} U {start city}) • LB as structure (spanning tree) • is a superset of reqd soln structure • (cycle) • min_cost(set S) <= min_cost(set S’) • if S is a superset of S’ A A 27 20 BeFS for finding an optimal solution
BFS for 0/1 ILP Solution X = {x1, …, xm} are 0/1 vars X2=1 X2=0 Solve LP w/ x2=1; Cost=cost(LP)=C2 Solve LP w/ x2=0; Cost=cost(LP)=C1 X4=0 X4=1 Solve LP w/ x2=1, x4=1; Cost=cost(LP)=C4 Solve LP w/ x2=1, x4=0; Cost=cost(LP)=C3 Cost relations: C5 < C3 < C1 < C6 C2 < C1 C4 < C3 X5=0 X5=1 Solve LP w/ x2=1, x4=1, x5=0 Cost=cost(LP)=C5 Solve LP w/ x2=1, x4=1, x5=1 Cost=cost(LP)=C6 optimal soln
Iterative Improvement Techniques Iterative improvement Stochastic (non-greedy) Deterministic Greedy Non-locally greedy Locally/immediately greedy • Make a combination of deterministic greedy moves and probabilistic moves that cause a detrioration (can help to jump out of local minima) • Until (stopping criteria satisfied) • Stopping criteria could be an upper bound on the total # of moves or iterations Make move that is best according to some non-immediate (non-local) metric (e.g., probability-based lookahead as in PROP) Until (no further impr.) Make move that is immediately (locally) best Until (no further impr.) (e.g., FM)