190 likes | 204 Views
Learn about Zimmerman's Heuristic Approach for generating a parallel prefix adder of minimum size with depth constraint. Explore the advantages, disadvantages, and dynamic programming involved in constructing the fastest prefix adder under arbitrary input arrival time profiles.
E N D
CSE246Adder – Part II Instructor: Prof. Chung-Kuan Cheng
Zimmerman’s Heuristic Approach • Problem formulation • Given depth constraint, generate a parallel prefix adder of minimum size • Two step Heuristic Start with a serial prefix adder • Compress to a fastest prefix structure at the cost of increasing size • LSB to MSB, low level to high level • Expand to reduce size, subject to depth constraint • MSB to LSB, high level to low level
Zimmerman’s Heuristic Approach • Local compression/expansion operation • Up/down shift
Zimmerman’s Heuristic Approach • Advantages • Simple and fast • Product depth-size optimal result in many cases • Handles non-uniform input arrival times • Disadvantage • No guarantee on optimality
Prefix Adder with arbitrary input arrival time profile • Non-uniform input arrival times represented in real number • How to construct the fastest prefix adder under arbitrary input arrival time profile?
Cont’ • Timing model • All (G,P) generators have the same delay C • Denote the output timing of generator (G,P)[i:j] as t[i:j] • Suppose in the prefix graph, (G,P)[i:j] is generated from (G,P)[j:k] and (G,P)[k-1:j], then t[i:j] = max{t[i:k] , t[k-1:j] }+C
… … Level 1: … (G,P)[i:j] = (G,P)[i:k] (G,P)[k-1:j] Level 2: . … . . Level n: Dynamic Programming – The idea • Image a full array of partial prefix results • All (G,P) signals of length i are on level i • Rightmost signals are wanted prefix results • Generate all the (G,P) signals row by row, from lower level to higher level • For each (G,P) signal, find the scheme that leads to best timing, i.e., find the partition point k such that t[i:j] = min{max{t[i:k] , t[k-1:j] }+C} t[n:n] t[n-1:n-1] t[2:2] t[1:1] k t[n:n-1] t[2:1] t[n:n-2] t[3:1] t[n:2] t[n-1:1] t[n:1]
2(g4p4) 4(g3p3) 3(g2p2) 1(g1p1) 0(G0) Level 1 6 6 5 3(GP[1,0]) Level 2 8 7 5(GP[2,0]) Level 3 8 7(GP[3,0]) Level 4 8(GP[4,0]) Level 5 7 8 Dynamic Programming • A 5-bit example
Dynamic Programming • Complexity • For (G,P)[i:j], search (i-j) combinations • Overall O(n3) • Hints for reducing complexity • For (G,P)[i:j], there might more than one optimal partition points, but we want just one • At least one optimal partition point of (G,P)[i:j] is bounded by the optimal partition points of (G,P)[i-1:j] and (G,P)[i:j+1]
Backward Reduction I • Some of the partial prefix results are not used, hence can be removed Level 1 Level 2 Level 3 Level 4 Level 5 (a) (b)
3(g4p4) 3(g4p4) 6(g3p3) 6(g3p3) 7(g2p2) 7(g2p2) 11(g1p1) 11(g1p1) 8 9 8 9 13 13 (9) (G,P)[2,1] (G,P)[2,1] (11) (9) (11) (13) (13) 10 10 13 13 (11) (G,P)[4,2] (G,P)[4,2] (G,P)[3,1] (G,P)[3,1] (11) (13) (13) 13 13 (G,P)[4,1] (G,P)[4,1] (13) (13) 9 8 9 () (9) (9) 11 11 (11) (11) Backward Reduction II • Some nodes may be over tightened, and can be relaxed to reduce area
A missing detail • (G,P) signals allows overlap search space increases • However, allowing overlapping does not produce better timing (G,P)[i:j] = (G,P)[i:k] (G,P)[l:j] l ≥k
a11,8 b11,8 a7,4 b7,4 a3,0 b3,0 c12 c8 c4 cin A2 A1 A0 p11,8 p7,4 p3,0 x c12 0 1 0 1 0 1 c4 c8 Function level optimization • Carry Skip Adder If p3,0=p3p2p1p0 = 1, then x = cin
False Path • A1 <- MUX <- A0 <- cin is a false path • If carry is from cin, then block must have p3p2p1p0 = 1 • Since p3,0 = 1, g3,0 must be 0 • The carry is not generated from A0 • The carry needs not to propagate via A0, it will go from the MUX
False Path: Cycles • Cycles of False Paths: Eg. 1’s complement number addition Positive: x Negative: (2n-1)-x • Addition (2n-1)-x + (2n-1)-y = 2n+(2n-1)-(x+y)-1 A3,0 B3,0 Cout Cin Adder S3,0
Example • 0+0=0 11111 0 + 11111 0 111110 111111 0 • -3-5 = -8 11100 -3 + 11010 -5 110110 110111 -8
Multi-Operand Addition • Carry save adder: a (3,2) counter
Example • A (3,2) counter compresses X rows to 2/3X rows each time • Tree structure in implementation
Other Counters • (7,3) counter • (5,3) counter S1 Ca Cb S0 S2 S0 • Design of (5,3) counter using full adders Ca Cb S0