130 likes | 249 Views
Datapath Designs. CK Cheng CSE Department UC, San Diego. Prefix Adder – Well-known and Well-developed?. Classic prefix networks: Sklansky, Kogge-Stone, Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles etc. Prefix Adder – New Respects, New Method.
E N D
Datapath Designs CK Cheng CSE Department UC, San Diego
Prefix Adder –Well-known and Well-developed? • Classic prefix networks: Sklansky, Kogge-Stone, Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles etc.
Prefix Adder –New Respects, New Method • Realistic design considerations: Timing, Power and Area. • Integer Linear Programming for prefix adder: • Logic effort timing model (gate cap. + wire cap.) • Activity-statistic power model • Non-uniform signal arrival/required times Logic Levels Timing Power Area Max Fanouts Max Wire Tracks
Prefix Adder –Optimum Prefix adders • Uniform signal arrival/required times Sklansky Adder Kogge-Stone Adder Fastest depth-3 optimal prefix adder Fastest depth-4 optimal prefix adder
Prefix Adder –Optimum Prefix adders • Uniform signal arrival/required times
Prefix Adder –Optimum Prefix adders • Non-uniform signal arrival/required times Increasing Signal Arrival Times Decreasing Signal Arrival Times Convex Signal Arrival Times
0.1 1 0 1 1 0 1 0 1 0 0 1 R0=A 1 0 1 0 1 0 0 0 R1 Q1 = 0.1Q2 = 0.01Q3 = 0.000Q4 = 0.0001 1 0 1 0 0 1 0 0 R2 0 0 0 0 1 0 0 0 R3 1 0 1 0 0 1 1 0 R4 Division – Iteration effort • Pencil and paper method: (A=QB+2-nR and R<B)1 bit partial quotient per iteration, n iterations A = 0.1001, B = 0.1010; Q= A / B. + Qi: Partial Quotient Ri: Partial Remainder Ri+1 = Ri – B Qi Q = 0.1101
Division – Memory effort • Lookup table is the simplest way to obtain multiple partial quotient bits in each iteration. • SRT method: a lookup tables stores m-bit partial quotients decided by m bits of partial remainder and m bits of divisor. Table size: 22m m • STR method is limited by memory wall.
Division – Arithmetic effort • Partial quotient is calculated by arithmetic functions. • Prescaling: • Taylor expansion: • Series expansion:
Division – Solution space • Modern FPGAs contains plenty of memory and build-in multipliers, which enable high performance divider. Memory Effort Our target SRT Memory Wall Low latency Prescaling Pencil-and-paper Series Expansion Iteration Effort Taylor Expansion Arithmetic Effort Low area
Division – PST algorithm • Utilize the power of series expansion, but need a good start point. • Prescaling provide a scaled divisor close to 1. • 0-order Taylor expansion iterates to reach the final quotient
A1 = A E0 =0.1101,1000,0010 B1 = B E0 =0.1111,0001,0001 Q1 = A1 E1 =0.1110,0011 R1 = B1 – Q1 B1 =0.0000,0010,0101,1110,1101 Q2 = R1 E1 =0.1001,1111 R2 = R1 – Q2 B1 =0.0000,0001,1111,1011,0001 Q =0.1110,0011+ 0.0000,0010,0111,11= 0.1110,0101,0111,11 Division – PST algorithm B(m) =0.1100 E0 =1.0011 A =0.1011,0110 B =0.1100,1011 E1 = INV(B1(2m)) =1.0000,1110 E0 = Table (B(m)) 1/B A1 = AE0; B1 = BE0 E1 = (2 B1) INV(B1(2m)) Qi = Ri-1 E1 Ri= Ri-1 Qi B1 Q = Q + Qi
Division – FPGA Implementation • PST algorithm is suitable for high-performance division unit design in FPGAs 32-bit division with 5-cycle latency