1 / 35

Enhancing MIPS ALU Design and Operations

Learn about MIPS ALU design, logic operations, overflow detection, and tailoring ALU for MIPS ISA instructions. Explore full adder implementation, logic gates, and ALU control codes.

brase
Download Presentation

Enhancing MIPS ALU Design and Operations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 14:332:331Computer Architecture and Assembly LanguageFall 2003Week 7 [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

  2. Head’s Up • This week’s material • MIPS logic and multiply instructions • Reading assignment – PH 4.4 • MIPS ALU design • Reading assignment – PH 4.5 • Next week’s material • Building a MIPS datapath • Reading assignment – PH 5.1-5.2

  3. zero ovf 1 1 A 32 ALU result 32 B 32 4 m (operation) Review: MIPS Arithmetic Instructions 31 25 20 15 5 0 R-type: op Rs Rt Rd funct I-Type: op Rs Rt Immed 16 • expand immediates to 32 bits before ALU • 10 operations so can encode in 4 bits 0 add 1 addu 2 sub 3 subu 4 and 5 or 6 xor 7 nor a slt b sltu Type op funct ADD 00 100000 ADDU 00 100001 SUB 00 100010 SUBU 00 100011 AND 00 100100 OR 00 100101 XOR 00 100110 NOR 00 100111 Type op funct 00 101000 00 101001 SLT 00 101010 SLTU 00 101011 00 101100

  4. carry_in A 1 bit FA S B carry_out Review: A 32-bit Adder/Subtractor add/subt c0=carry_in • Built out of 32 full adders (FAs) A0 1-bit FA S0 B0 c1 A1 1-bit FA S1 B1 c2 A2 1-bit FA S2 B2 c3 . . . S = A xor B xor carry_in carry_out = AB v Acarry_in v Bcarry_in (majority function) c31 A31 1-bit FA S31 B31 c32=carry_out • Small but slow!

  5. Minimal Implementation of a Full Adder • Gate library: inverters, 2-input nands, or-and-inverters architecture concurrent_behavior of full_adder is signal t1, t2, t3, t4, t5: std_logic; begin t1 <=not A after 1 ns; t2 <=not cin after 1 ns; t4 <=not((A or cin) and B) after 2 ns; t3 <=not((t1 or t2) and (A or cin)) after 2 ns; t5 <= t3 nand B after 2 ns; S <=not((B or t3) and t5) after 2 ns; cout <=not(t1 or t2) and t4) after 2 ns; end concurrent_behavior; • Can you create the equivalent schematic? Can you determine worst case delay (the worst case timing path through the circuit)?

  6. Logic Operations • Logic operations operate on individual bits of the operand. $t2 = 0…0 0000 1101 0000 $t1 = 0…0 0011 1100 0000 and $t0, $t1, $t2 $t0 = or $t0, $t1 $t2 $t0 = xor $t0, $t1, $t2 $t0 = nor $t0, $t1, $t2 $t0 = • How do we expand our FA design to handle the logic operations - and, or, xor, nor ?

  7. add/subt carry_in op A result 1-bit FA B add/subt carry_out A Simple ALU Cell

  8. An Alternative ALU Cell s2 s1 s0 carry_in 1-bit FA A result B carry_out

  9. The Alternative ALU Cell’s Control Codes

  10. Tailoring the ALU to the MIPS ISA • Need to support the set-on-less-than instruction (slt) • remember: slt is an arithmetic instruction • produces a 1 if rs < rt and 0 otherwise • use subtraction: (a - b) < 0 implies a < b • Need to support test for equality (beq) • use subtraction: (a - b) = 0 implies a = b • Need to add the overflow detection hardware

  11. less Modifying the ALU Cell for slt add/subt carry_in op A result 1-bit FA B add/subt carry_out

  12. Modifying the ALU for slt A0 • First perform a subtraction • Make the result 1 if the subtraction yields a negative result • Make the result 0 if the subtraction yields a positive result result0 B0 + less A1 result1 B1 + less . . . A31 result31 B31 + less

  13. op add/subt Modifying the ALU for Zero A0 • First perform subtraction • Insert additional logic to detect when all result bits are zero result0 B0 + less A1 result1 B1 + 0 less . . . A31 result31 B31 + 0 less set

  14. 0 1 1 1 1 0 0 1 1 1 7 1 1 0 0 –4 + 0 0 1 1 3 + 1 0 1 1 – 5 1 0 1 0 0 1 Review: Overflow Detection • Overflow: the result is too large to represent in the number of bits allocated • Overflow occurs when • adding two positives yields a negative • or, adding two negatives gives a positive • or, subtract a negative from a positive gives a negative • or, subtract a positive from a negative gives a positive • On your own: Prove you can detect overflow by: • Carry into MSB xor Carry out of MSB – 6 1 1 7

  15. op overflow add/subt Modifying the ALU for Overflow A0 • Modify the most significant cell to determine overflow output setting • Disable overflow bit setting for unsigned arithmetic result0 B0 + less A1 result1 B1 zero + . . . 0 less . . . A31 result31 + B31 0 less set

  16. Example: When do the result outputs settle at their final values for the inputs: add/subt = 0 op = 000 A = 1111 B = 0001

  17. Example: cont’d When do the result outputs settle at their final values for the inputs: add/subt = 0 op = 100 A = 1111 B = 0001

  18. Example: cont’d When do the result outputs settle at their final values for the inputs: add/subt = 1 op = 101 A = 1111 B = 0001 What is the zero output of these inputs?

  19. Example: cont’d With the ALU design described in class, we assumed that a subtraction operation had to be performed as part of the beq instruction. When do the outputs settle? Is there a faster alternative?

  20. But What about Performance? • Critical path of n-bit ripple-carry adder is n*CP • Design trick – throw hardware at it (Carry Lookahead) CarryIn0 A0 1-bit ALU Result0 B0 CarryOut0 CarryIn1 A1 1-bit ALU Result1 B1 CarryOut1 CarryIn2 A2 1-bit ALU Result2 B2 CarryOut2 CarryIn3 A3 1-bit ALU Result3 B3 CarryOut3

  21. Fast carry using “infinite” hardware (Parallel) • cout = b • cin + a • cin + a • b c1 = (b0+a0)•c0 + a0•b0 = a0•b0 + a0•c0 + b0•c0 c2 = (b1+a1)•c1 + a1•b1 = (b1+a1)•((b0+a0)•c0 + a0•b0) + a1•b1 = a1•a0•b0 + a1•a0•c0 + b1•a0•c0 + b1•a0•b0 + a1•b0•c0 + b1•b0•c0 + b1•a1 c3 = a2•a1•a0•b0 + a2•a1•a0•c0 + a2•b1•a0•c0 + a2•b1•a0•b0 + a2•a1•b0•c0 + a2•b1•b0•c0 + a2•b1•a1 + … … • Outputs settle much faster • D_c3 = 2* D_and + D_or (best case) • … • D_c31 = 5 *D_and + D_or (best case) • Problem: Prohibitively expensive

  22. Hierarchical Solution I • Hierarchical solution I • Group 32 bits into 8 4-bit groups • Within each group, use carry look ahead • Use 4-bit as a building block, and connect them in ripple carry fashion.

  23. First Level: Propagate and generate ci+1 = (ai•bi)+(ai+bi)•ci gi = ai•bi pi = (ai+bi) • ci+1 = 1 if • gi = 1, or • pi and ci = 1 • c1 = g0+(p0•c0) c2 = g1+(p1•g0)+(p1•p0•c0) c3 = g2+(p2•g1)+(p2•p1•g0)+(p2•p1•p0•c0) c4 = g3+(p3•g2)+(p3•p2•g1)+ (p3•p2•p1•g0) + (p3•p2•p1•p0•c0) ci+1 = gi + pi•ci

  24. c0=carry_in A0 B0 A1 ALU0 B1 A2 B2 A3 B3 Hierarchical Solution I (16 bit) result 0-3 c4=carry_in A4 B4 Delay = 4 * Delay ( 4-bit carry look-ahead ALU) A5 ALU1 B5 A6 result 4-7 B6 A7 B7 …

  25. Hierarchical Solution II • Hierarchical solution I • Group 32 bits into 8 4-bit groups • Within each group, use carry look ahead • Use 4-bit as a building block, and connect them in ripple carry fashion. • Hierarchical solution II • Group 32 bits into 8 4-bit groups • Within each group, use carry look ahead • Another level of carry look ahead is used to connect these 4-bit groups

  26. C1 ci+1 Hierarchical Solution II A0 result 0-3 cin B0 pi P0 Carry-lookahead unit A3 gi G0 B3 result 4-7 A4 B4 • input a0-a15, b0-b15 • calculate P0-P3, G0-G3 • Calculate C1-C4 • each 4-bit ALU calculates its results pi+1 P1 gi+1 G1 A7 B7 C2 ci+2 result 8-11 A8 B8 pi+2 P2 A11 B11 G2 gi+2 C3 ci+3 result 12-15 A12 B12 pi+3 P3 gi+3 G3 A15 ci+3 B15 cout

  27. Fast Carry using the second level abstraction • P0 = p3.p2.p1.p0 P1 = p7.p6.p5.p4 P2 = p11.p10.p9.p8 P3 = p15.p14.p13.p12 • G0 = g3+(p3.g2) + (p3.p2.g1) + (p3.p2.p1.g0) G1 = g7+(p7.g6) + (p7.p6.g5) + (p7.p6.p5.g4) G2 = g11+(p11.g10)+(p11.p10.g9) + (p11.p10.p9.g8) G3 = g15+(p15.g14)+(p15.p14.g3)+(p15.p14.p3.g12) • C1 = G+(P0•c0) C2 = G1+(P1•G0)+(P1•P0•c0) C3 = G2+(P2•G1)+(P2•P1•G0)+(P2•P1•P0•c0) C4 = G3+(P3•G2)+(P3•P2•G1)+(P3•P2•P1•G0) + (P3•P2•P1•P0•c0)

  28. 000000 00000 10000 01010 01000 000000 000000 00000 10000 01010 01000 000010 op rs rt rd shamt funct Shift Operations • Also need operations to pack and unpack 8-bit characters into 32-bit words • Shifts move all the bits in a word left or right sll $t2, $s0, 8 #$t2 = $s0 << 8 bits srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits • Such shifts are logical because they fill with zeros

  29. 000000 00000 10000 01010 01000 000011 Shift Operations, con’t • An arithmetic shift (sra) maintain the arithmetic correctness of the shifted value (i.e., a number shifted right one bit should be ½ of its original value; a number shifted left should be 2 times its original value) • so sra uses the most significant bit (sign bit) as the bit shifted in • note that there is no need for a sla when using two’s complement number representation sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits • The shift operation is implemented by hardware (usually a barrel shifter) outside the ALU

  30. Multiplication • More complicated than addition • accomplished via shifting and addition 0010(multiplicand) x_1011(multiplier) 0010 0010 (partial product 0000 array) 0010 00010110 (product) • Double precision product produced • More time and more area to compute

  31. 000000 10000 10001 00000 00000 011000 op rs rt rd shamt funct MIPS Multiply Instruction mult $s0, $s1 # hi||lo = $s0 * $s1 • Low-order word of the product is left in processor register lo and the high-order word is left in register hi • Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file

  32. Review: MIPS ISA, so far

  33. Review: MIPS ISA, so far con’t

  34. Review: MIPS ISA, so far con’t

More Related