1 / 38

Lecture 18: Datapath Functional Units

Lecture 18: Datapath Functional Units. Outline. Comparators Shifters Multi-input Adders Multipliers. Comparators. 0’s detector: A = 00…000 1’s detector: A = 11…111 Equality comparator: A = B Magnitude comparator: A < B. 1’s & 0’s Detectors. 1’s detector: N-input AND gate

jrafter
Download Presentation

Lecture 18: Datapath Functional Units

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 18: DatapathFunctionalUnits

  2. Outline • Comparators • Shifters • Multi-input Adders • Multipliers 18: Datapath Functional Units

  3. Comparators • 0’s detector: A = 00…000 • 1’s detector: A = 11…111 • Equality comparator: A = B • Magnitude comparator: A < B 18: Datapath Functional Units

  4. 1’s & 0’s Detectors • 1’s detector: N-input AND gate • 0’s detector: NOTs + 1’s detector (N-input NOR) Asymmetric design that favors the late-arriving inputs. Here, the delay from the latest bit A7 is a single gate 18: Datapath Functional Units

  5. 1’s & 0’s Detectors • Fast pseudo-nMOS or dynamic NOR detector. • Works well for words up to about 16 bits • For larger words, the gates can be split into 8–16-bit chunks to reduce the parasitic delay 18: Datapath Functional Units

  6. Equality Comparator • Check if each bit is equal (XNOR, aka equality gate) • 1’s detect on bitwise equality 18: Datapath Functional Units

  7. Magnitude Comparator • Compute B – A and look at sign • B – A = B + ~A + 1 • If there is a carry-out: A<=B. Otherwise: A>B. • zero detector indicates that the numbers are equal. 18: Datapath Functional Units

  8. Counters • An N-bit binary counter sequences through 2N outputs in binary order. • Features of counters: • Resettable: counter value is reset to 0 when RESET is asserted • Loadable: counter value is loaded with N-bit value when LOAD is asserted • Enabled: counter counts only on clock cycles when EN is asserted • Reversible: counter increments or decrements based on UP/DOWN input • Terminal Count: TC output asserted when counter overflows 18: Datapath Functional Units

  9. Binary Counter • Asynchronous ripple-carry counter • It is composed of N registers. • The falling transition of each register clocks the subsequent register. • Asynchronous circuits introduce a whole assortment of problems. • Delay is long. • Delay from clk to Q3 is long. 18: Datapath Functional Units

  10. Binary Counter • Synchronous up/down counter • Resettable • Cycle time is limited by the ripple-carry delay with reset, load, and enable 18: Datapath Functional Units

  11. Fast Binary Counter • The speed of the previous counter is limited by the adder • It can be overcome by dividing the counter into two or more segments. • A 32-bit counter could be constructed from: • a 4-bit prescalar counter • 28-bit counter 18: Datapath Functional Units

  12. Shifters • Logical Shift: • Shifts number left or right and fills with 0’s • 1011 LSR 1 = 0101 1011 LSL1 = 0110 • Arithmetic Shift: • Shifts number left or right. Rt shift sign extends • 1011 ASR1 = 1101 1011 ASL1 = 0110 • Rotate: • Shifts number left or right and fills with lost bits • 1011 ROR1 = 1101 1011 ROL1 = 0111 18: Datapath Functional Units

  13. Funnel Shifter • A funnel shifter can do all six types of shifts • creates a 2N – 1-bit input word Z from A and/or the kill values. • Selects N-bit field Y from 2N–1-bit input • Shift by k bits (0  k < N) • Logically involves N N:1 multiplexers 18: Datapath Functional Units

  14. Funnel Source Generator Rotate Right Logical Right Arithmetic Right Rotate Left Logical/Arithmetic Left 18: Datapath Functional Units

  15. Array Funnel Shifter • N N-input multiplexers • Use 1-of-N hot select signals for shift amount • nMOS pass transistor design (Vt drops!) N=4 18: Datapath Functional Units

  16. Array Funnel Shifter • Source generator consists of a 2N – 1-bit 5:1 multiplexer controlled by the shift type and direction. Z2N-2:N 18: Datapath Functional Units

  17. Multi-input Adders • Suppose we want to add k N-bit words • Ex: 0001 + 0111 + 1101 + 0010 = 10111 • Straightforward solution: k-1 N-input CPAs • Large and slow 18: Datapath Functional Units

  18. Carry Save Addition • A full adder sums 3 inputs and produces 2 outputs • Carry output has twice weight of sum output (X + Y + Z = S + 2C) • N full adders in parallel are called carry save adder • Produce N sums and N carry outs 18: Datapath Functional Units

  19. CSA Application • Use k-2 stages of CSAs • Keep result in carry-save redundant form • Final CPA computes actual result 18: Datapath Functional Units

  20. Multiplication • Multiplication is less common than addition. • Multiplier is essential for • Microprocessors • Digital signal processors • Graphics engines 18: Datapath Functional Units

  21. Multiplication • Example: • M x N-bit multiplication • Produce N M-bit partial products (PPs) • Sum these to produce M+N-bit product 18: Datapath Functional Units

  22. General Form • Multiplicand: Y = (yM-1, yM-2, …, y1, y0) • Multiplier: X = (xN-1, xN-2, …, x1, x0) • Product: 18: Datapath Functional Units

  23. Dot Diagram • Large multiplications are illustrated using dot diagrams. • Each dot represents a bit M N 18: Datapath Functional Units

  24. Multiplier design • Depends on: • Latency • Throughput • Energy • Area • Design complexity • Simple approach: • use an M + 1-bit carry-propagate adder (CPA) to add the first two partial products • then another CPA to add the third partial product to the running sum, and so forth • Requires N – 1 CPAs and is slow • Better approach: using arrays 18: Datapath Functional Units

  25. Y Y Y Y X 3 2 1 0 0 X Y Y Y Y 1 3 2 1 0 HA FA FA HA X Y Y Y Y 2 3 2 1 0 FA FA FA HA X Y Y Y Y 3 3 2 1 0 FA FA FA HA P0 P1 The Array Multiplier P6 P5 P4 P2 P3 P7

  26. HA FA FA HA Critical Path 1 FA FA FA HA Critical Path 2 FA FA FA HA Many paths of almost identical length can be identified. The MxN Array Multiplier-Critical Path ta tc tc tc ts tc ts Critical Path 1 & 2 tc ts For path 2:

  27. Carry-Save Multiplier

  28. Rectangular Array • Squash array to fit rectangular floorplan (the parallelogram shape wastes space. 18: Datapath Functional Units

  29. Improvement • The first row of CSAs adds the first partial product to a pair of 0s (Sin=0 and Cin=0). • Leads to a regular structure • but is inefficient. • The first row of CSAs can be used to add the first three partial products together. • This reduces the number of rows by two and reduces the adder propagation delay • Another improvement: replace the bottom row with a faster CPA such as a lookahead or tree adder • The critical path of an array multiplier involves N–2 CSAs and a CPA. 18: Datapath Functional Units

  30. Two’s Complement Array Multiplication • MSB of a two’s complement number has a negative weight. • Two of the PPs have negative weight and should be subtracted. • Subtraction is handled by taking the two’s complement of the terms to be subtracted, i.e., inverting the bits and adding one. 18: Datapath Functional Units

  31. Baugh-Wooley multiplier algorithm • The upper parallelogram: Multiplication of all but the MSBs of the inputs • Next row: product of the MSB (X5Y5) • Next two pairs of rows: the inversions of the terms to be subtracted. • Each term has implicit leading and trailing zeros, which are inverted to leading and trailing ones. • Extra ones must be added in the least significant column when taking the two’s complement. 18: Datapath Functional Units

  32. Modified Baugh-Wooley multiplier • Reduces the number of partial products: • precomputing the sums of the constant ones • pushing some of the terms upward into extra columns 18: Datapath Functional Units

  33. squashed into a rectangle • Almost identical to the unsigned multiplier • AND gates are replaced by NAND gates in the hatched cells. • 1s are added in place of 0s at two of the unused inputs. • Signed and unsigned arrays are so similar • A single array can be used for both • XOR gates are used to conditionally invert some of the terms depending on the mode. 18: Datapath Functional Units

  34. Fewer Partial Products • Array multiplier requires N partial products • If we looked at groups of r bits, we could form N/r partial products. • Faster and smaller? • Called radix-2r encoding • Ex: r = 2: look at pairs of bits • Form partial products of • 00  0 • 01  Y • 10  2Y simple shift • 11  3Y hard multiple, requiring a slow carry-propagate addition of Y+2Y • First three are easy, but 3Y requires adder  18: Datapath Functional Units

  35. Booth Encoding • Observe that: • 3Y = 4Y – Y • 2Y = 4Y – 2Y • 4Y in a radix-4 multiplier array is equivalent to Y in the next row 18: Datapath Functional Units

  36. Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product 18: Datapath Functional Units

  37. Example 18: Datapath Functional Units

  38. Booth Hardware • Booth encoder generates control lines for each PP • Booth selectors choose PP bits 18: Datapath Functional Units

More Related