180 likes | 321 Views
VLSI Arithmetic. Multiplication. A = a n-1 a n-2 … a 1 a 0. eg). 1 1 0 0 1 1. . B = b n-1 b n-2 … b 1 b 0. . 1 0 1 0 0 1. 1 1 0 0 1 1. 0 0 0 0 0 0. 0 0 0 0 0 0. 1 1 0 0 1 1. Shift and add Area O(N) Time O(NlogN) Too slow. 0 0 0 0 0 0. 1 1 0 0 1 1.
E N D
Multiplication A = an-1 an-2 … a1 a0 eg) 1 1 0 0 1 1 B = bn-1 bn-2 … b1 b0 1 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 Shift and add Area O(N) Time O(NlogN) Too slow 0 0 0 0 0 0 1 1 0 0 1 1
an-1 an-2 … a1 a0 bn-1 bn-2 … b1 b0 an-1 an-2 … a1 a0 b0 = B0 an-1 an-2 … a1 a0 b1 = B1 an-1 an-2 … a1 a0 b2 = B2 an-1 an-2 … a1 a0 an-1 an-2 … a1 a0 bn-1 = Bn-1 Sum
b0 b1 bn-1 2n bit processors Algorithm 1. Broadcast A to n processors (log n time) 2. Compute Bi(i=0, …, n-1) simultaneous 3. Compute sum (using redundant binary number) Time O(logn) Space O(N2)
3M - multiplication (Why not 4M - multiplication?) P = X • Y U = (X1+X0) (Y1+Y0) V = (X1 • Y1) W = (X0 • Y0) P = (X• Y) = V • 2N + (U-V-W) • 2N/2 + W A = O(N2) T = O(log N) X Y Input distribution (X1+ X0) (Y1+ Y0) recursively Routing X1 • Y1 (X1+ X0) (Y1+ Y0) X0 • Y0 Routing Adder (n) Output network
Area, Time, Period Complexity, and Optimality Virsion Lower bound 4M 3M 2M, LABC Area N2 N2 log2N N2 MN logN Time log N log N log N log N Period 1 1 1 1 AP2 N2 N2 log2N N2 MN logN AP2T2 N2 log2N N2 log4N N2 log2N MN log3N Remark -- Time-optimal Time, AP2, and AP2T2 optimal Time-optimal and regular layout
Redundant Binary Number (Signed Digit) where ai {0, 1,1} Example. 1 1 0 1 1 1 = 25 - 24 + 22 + 21 - 20 1. Binary number is a redundant binary number 2. Note that 1 = 1 1 3. Redundant binary number Binary Number by subtraction (in log n time)
Example 1 1 1 0 1 = 10100 - 1001 = 15 Example addition 1 1 1 1 0 1 (5)10 1 0 0 1 1 0 (38)10 + S = 0 1 1 0 1 1 (sum) 1 1 0 0 1 0 1 1 1 1 1 1 1
Addition (Subtraction): carry propagation is limited to one bit left Type 1 2 3 4 5 6 Augend ai 1 1 0 0 1 1 0 1 1 Augend bi 1 0 1 0 1 1 1 0 1 Carry 1 1 if there is carry 1 from lower end 0 otherwise 0 no carry 1, if there is a carry 1 from lower end 0, otherwise 1
bi 1 0 1 0 1 1 1 0 1 Next lower position ai-1, bi-1 if (1,0), (0,1), (1,1) else if (1,0), (0,1), (1,1) else ai 1 1 0 0 1 1 0 1 1 ci 1 1 0 0 1 0 1 si 0 1 1 0 1 1 0 SD addition rule table
R0 A R1 A1 R2 C0 C2 C3 C1 A2 R3 A3 Hardware for multiplication Mesh of Trees Number of PEs = O(n*n) Area (n2log2n) Multiplication A*B Ri A shift i bitif bi 0 Column Ciadd logn bits Use redundant binary, add these numbers
R0 1 0 1 1 A R1 1 1 1 A1 R2 C0 C2 C3 C1 A2 0 0 R3 1 A3 Example of multiplication on mesh of trees with augmented mesh edges A=0111 B=1011 Consider only last 4 bits 1
Example of multiplication on mesh of trees R0 Ci contains the sum at most logn bits long Note that Ci starts from i-th bit. So the k-th bit of Ci is pipelined to the row i+k Each bit ci is computed at (i,i) The pipelined value will be added one by one using Redundant binary system in a constant step. Then the number is converted to a binary number Total: 2logn: Ri 2logn : add to (i,i) location 2logn : covert to binary 1 0 1 1 A R1 1 1 1 R2 C0=1 C2 =10 A1 C3=10 C1= 10 0 0 A2 R3 A3 1 1
Integer Division • Not as easy • O(logn) algorithm exist with table look up • Hardware circuit exist? => open question
To find , let Newton Rapson Method To solve f(x) = 0, Newton Rapson Method converges quadratically, That is, i+1 = i2
Eg. When D = 4 set x0 = 0.4 x1 = 0.16 x2 = 0.2176 x3 = 0.245801 0 = 0.15 1 = 0.09 2 = 0.0324 3 = 0.004199 To get n precision reciprocal of D, we need logn iterations. 1st iteration: 1 digit correct 2nd iteration: 2didit correct 3rd iteration: 4 digit correct logn iterations: n digit correct
where Proof that Newton Rapson Method converges quadratically. Let X be the solution of f(x) = 0. But Since f(X) = 0, we have Thus, Since f”() is bounded and f’(xi) is bounded, |i+1| = c |i|2, for some c For is bounded if D 0
Complexity • Each *: O(logn) time • to obtain n digit precision, O(logn) iteration • => O(log2n) complexity • A/D => A * (1/D) • Question: logn algorithm for division?