Lecture 4: Arithmetic for Computers (Part 3)

Lecture 4: Arithmetic for Computers(Part 3) CS 447 Jason Bakos

Chapter 4 Review • So far, we’ve covered the following topics for this chapter • Binary representation of signed integers • 16 to 32 bit signed conversion • Binary addition/subtraction • Overflow detection/overflow exception handling • Shift and logical operations • Parts of the CPU • AND, OR, XOR, and inverter gates • Multiplexor (mux) and full adder • Sum-of-products logic equations (truth tables) • Logic minimization techniques • Don’t cares and Karnaugh Maps

1-bit ALU Design • A 1-bit ALU can be constructed • Components • AND, OR, and adder • 4-to-1 mux • “Binverter” (inverter and 2-to-1 mux) • Interface • Inputs: A, B, Binvert, Operation (2 bits), CarryIn, and Less • Outputs: CarryOut and Result • Digital functions are performed in parallel and the outputs are routed into a mux • The mux will also accept a Less input which we’ll accept from outside the 1-bit ALU • The select lines of the mux make up the “operation” input to the ALU

32-bit ALU • In order to create a multi-bit ALU, array 32 1-bit ALUs • Connect the CarryOut of each bit to the CarryIn of the next bit • A and B of each 1-bit ALU will be connected to each successive bit of the 32-bit A and B • The Result outputs of each 1-bit ALU will form the 32-bit result • We need to add an SLT unit and connect the output to the least significant 1-bit ALU’s Less input • Hardwire the other “Less” inputs to 0 • We need to add an Overflow unit • We need to add a Zero detection unit

SLT Unit • To compute SLT, we need to make sure that when the 1-bit ALU’s Operation is set to 11, a subtract operation is also being computed • With this happening, the SLT unit can compute Less based on the MSB (sign) of A, B, and Result

Overflow Unit • When doing signed arithmetic, we need to follow this table, as we covered previously… • How do we implement this in hardware?

Overflow Unit • We need a truth table… • Since we’ll be computing the logic equation with SOP, we only need the rows where the output is 1

Zero Detection Unit • “Or” together all the 1-bit ALU outputs – the result is the Zero output to the ALU

32-bit ALU Operation • We need a 3-bit ALU Operation input into our 32-bit ALU • The two least significant bits can be routed into all the 1-bit ALUs internally • The most significant bit can be routed into the least significant 1-bit ALU’s CarryIn, and to Binvert of all the 1-bit ALUs

32-bit ALU Operation • Here’s the final ALU Operation table:

32-bit ALU • In the end, our ALU will have the following interface: • Inputs: • A and B (32 bits each) • ALU Operation (3 bits) • Outputs: • CarryOut (1 bit) • Zero (1 bit) • Result (32 bits) • Overflow (1 bit)

Carry Lookahead • The adder architecture we previously looked at requires n*2 gate delays to compute its result (worst case) • The longest path that a digital signal must propagate through is called the “critical path” • This is WAAAYYYY too slow! • There other ways to build an adder that require lg n delay • Obviously, using SOP, we can build a circuit that will compute ANY function in 2 gate delays (2 levels of logic) • Obviously, in the case of a 64-input system, the resulting design will be too big and too complex

Carry Lookahead • For example, we can easily see that the CarryIn for bit 1 is computed as: • c1=(a0b0)+(a0c0)+(b0c0) • c2=(a1b1)+(a1c1)+(b1c1) • Hardware executes in parallel, so using the following fast CarryIn computation, we can perform an add with 3 gate delays • c2=(a1b1)+(a1a0b0)+(a1a0c0)+(a1b0c0)+(b1a0b0)+(b1a0c0)+(b1b0c0) • I used the logical distributive law to compute this • As you can see, the CarryIn logic gets bigger and bigger for consecutive bits

Carry Lookahead • Carry Lookahead adders are faster than ripple-carry adders • Recall: • ci+1=(aibi)+(aici)+(bici) • ci can be factored out… • ci+1=(aibi)+(ai+bi)ci • So… • c2=(a1b1)+(a1+b1)((a0b0)+(a0+b0)c0)

Carry Lookahead • Note the repeated appearance of (aibi) and (ai+bi) • They are called generate (gi) and propagate (pi) • gi=aibi, pi=ai+bi • ci+1=gi+pici • This means if gi=1, a CarryOut is generated • If pi=1, a CarryOut is propagated from CarryIn

Carry Lookahead • c1=g0+(p0c0) • c2=g1+(p1g0)+(p1p0c0) • c3=g2+(p2g1)+(p2p1g0)+(p2p1p0c0) • c4=g3+(p3g2)+(p3p2g1)+(p3p2p1g0)+(p3p2p1p0c0) • …This system will give us an adder with 5 gate delays but it is still too complex

Carry Lookahead • To solve this, we’ll build our adder using 4-bit adders with carry lookahead, and connect them using “super”-propagate and generate logic • The superpropagate is only true if all the bits propagate a carry • P0=p0p1p2p3 • P1=p4p5p6p7 • P2=p8p9p10p11 • P3=p12p13p14p15

Carry Lookahead • The supergenerate follows a similar equation: • G0=g3+(p3g2)+(p2p2g1)+(p3p2p1g0) • G1=g7+(p7g6)+(p7p6g5)+(p7p6p5g4) • G2=g11+(p11g10)+(p11p10g9)+(p11p10p9g8) • G3=g15+(p15g14)+(p15p14g13)+(p15p14p13g12) • The supergenerate and superpropagate logic for the 4-4 bit Carry Lookahead adders is contained in a Carry Lookahead Unit • This yields a worst-case delay of 7 gate delays • Reason?

Carry Lookahead • We’ve covered all ALU functions except for the shifter • We’ll talk after the shifter later

Lecture 4: Arithmetic for Computers (Part 3)