1 / 13

Datapath Designs

Datapath Designs. CK Cheng CSE Department UC, San Diego. Prefix Adder – Well-known and Well-developed?. Classic prefix networks: Sklansky, Kogge-Stone, Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles etc. Prefix Adder – New Respects, New Method.

aulii
Download Presentation

Datapath Designs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Datapath Designs CK Cheng CSE Department UC, San Diego

  2. Prefix Adder –Well-known and Well-developed? • Classic prefix networks: Sklansky, Kogge-Stone, Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles etc.

  3. Prefix Adder –New Respects, New Method • Realistic design considerations: Timing, Power and Area. • Integer Linear Programming for prefix adder: • Logic effort timing model (gate cap. + wire cap.) • Activity-statistic power model • Non-uniform signal arrival/required times Logic Levels Timing Power Area Max Fanouts Max Wire Tracks

  4. Prefix Adder –Optimum Prefix adders • Uniform signal arrival/required times Sklansky Adder Kogge-Stone Adder Fastest depth-3 optimal prefix adder Fastest depth-4 optimal prefix adder

  5. Prefix Adder –Optimum Prefix adders • Uniform signal arrival/required times

  6. Prefix Adder –Optimum Prefix adders • Non-uniform signal arrival/required times Increasing Signal Arrival Times Decreasing Signal Arrival Times Convex Signal Arrival Times

  7. 0.1 1 0 1 1 0 1 0 1 0 0 1 R0=A 1 0 1 0 1 0 0 0 R1 Q1 = 0.1Q2 = 0.01Q3 = 0.000Q4 = 0.0001 1 0 1 0 0 1 0 0 R2 0 0 0 0 1 0 0 0 R3 1 0 1 0 0 1 1 0 R4 Division – Iteration effort • Pencil and paper method: (A=QB+2-nR and R<B)1 bit partial quotient per iteration, n iterations A = 0.1001, B = 0.1010; Q= A / B. + Qi: Partial Quotient Ri: Partial Remainder Ri+1 = Ri – B  Qi Q = 0.1101

  8. Division – Memory effort • Lookup table is the simplest way to obtain multiple partial quotient bits in each iteration. • SRT method: a lookup tables stores m-bit partial quotients decided by m bits of partial remainder and m bits of divisor. Table size: 22m m • STR method is limited by memory wall.

  9. Division – Arithmetic effort • Partial quotient is calculated by arithmetic functions. • Prescaling: • Taylor expansion: • Series expansion:

  10. Division – Solution space • Modern FPGAs contains plenty of memory and build-in multipliers, which enable high performance divider. Memory Effort Our target SRT Memory Wall Low latency Prescaling Pencil-and-paper Series Expansion Iteration Effort Taylor Expansion Arithmetic Effort Low area

  11. Division – PST algorithm • Utilize the power of series expansion, but need a good start point. • Prescaling provide a scaled divisor close to 1. • 0-order Taylor expansion iterates to reach the final quotient

  12. A1 = A  E0 =0.1101,1000,0010 B1 = B  E0 =0.1111,0001,0001 Q1 = A1 E1 =0.1110,0011 R1 = B1 – Q1 B1 =0.0000,0010,0101,1110,1101 Q2 = R1 E1 =0.1001,1111 R2 = R1 – Q2 B1 =0.0000,0001,1111,1011,0001 Q =0.1110,0011+ 0.0000,0010,0111,11= 0.1110,0101,0111,11 Division – PST algorithm B(m) =0.1100 E0 =1.0011 A =0.1011,0110 B =0.1100,1011 E1 = INV(B1(2m)) =1.0000,1110 E0 = Table (B(m))  1/B A1 = AE0; B1 = BE0 E1 = (2  B1)  INV(B1(2m)) Qi = Ri-1  E1 Ri= Ri-1 Qi  B1 Q = Q + Qi

  13. Division – FPGA Implementation • PST algorithm is suitable for high-performance division unit design in FPGAs 32-bit division with 5-cycle latency

More Related