230 likes | 353 Views
Computing Systems. Designing a basic ALU. operation. a. ALU. 32. result. 32. b. 32. Let’s start designing a processor !!!!. Almost ready to move into chapter 5 and start building a processor First, let’s review Boolean Logic and build the ALU we’ll need (Material from Appendix B).
E N D
Computing Systems Designing a basic ALU claudio.talarico@mail.ewu.edu
operation a ALU 32 result 32 b 32 Let’s start designing a processor !!!! • Almost ready to move into chapter 5 and start building a processor • First, let’s review Boolean Logic and build the ALU we’ll need(Material from Appendix B)
Review: Boolean algebra and gates Problem: • Consider a logic function with three inputs: A, B, and C. • Output D is true if at least one input is true • Output E is true if exactly two inputs are true • Output F is true only if all three inputs are true • Show the truth table for these three functions. • Show the Boolean equations for these three functions. • Show an implementation consisting of inverters, AND, and OR gates. Solution D = A + B + C E = A’.B.C + A.B’.C+ A.B.C’ F = A.B.C
carry in a sum b carry out one-bit adder It takes three input bits and generates two output bits Multiple bits can be cascaded
one-bit adder: Boolean algebra ci = carry in, co = carry out = ci+1, s = sum s = a’.b’.ci + a’.b.ci’+ a.b’.ci’ + a.b.ci = a xor b xor ci ci+1 = a.b + (a+b).ci when both a and b are 1ci+1 is 1 no matter ci
a0 b0 a1 b1 a2 b2 an-1 bn-1 cn-1 c1 c2 c3 … cn c0 s0 s1 s2 Sn-1 Ripple adder
It is more convenient to use pi* or pi** than pi Ripple adder and gi = ai.bi pi = ai’.bi.ci + ai.bi’.ci = ci.(ai xor bi) ci+1 = gi + pi.ci = gi + ci.ci.(ai xor bi) = gi.ci.(ai xor bi) = gi + ci.pi* where pi* = ai xor bi or alternatively ci+1 = gi + pi**.ci with pi** = ai + bi si = pi* xor ci but: si = pi xor ci is not true ! si = pi** xor ci is not true ! xor gates can be very fast if designed using pass transistors
a0 b0 a1 b1 a2 b2 an-1 bn-1 cn-1 c1 c2 c3 … cn c0 s0 s1 s2 Sn-1 Ripple adder timing worst case: tadder = (n-1) tcarry + tsum where: tcarry = delay fromci to ci+1 tsum = delay fromcn-1 to sn ci+1 = gi + pi*.ci si = pi* xor ci Assuming the sum circuit is slower than the carry otherwise simply: tadder = n tcarry
Problem: carry ripple adder is slow • Is there more than one way to do addition? • two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c1 = b0c0 + a0c0 +a0b0 c2 = b1c1 + a1c1 +a1b1 c2 = c3 = b2c2 + a2c2 +a2b2 c3 = c4 = b3c3 + a3c3 +a3b3 c4 =… sum-of-product not feasible! Why? YES !!! we would need “infinite”hardware !!!
Carry-lookahead adder • An approach in-between the two extremes (sum-of-products and ripple adders) • Motivation: • If we didn't know the value of carry-in, what could we do? • When would we always generate a carry? gi=ai bi • When would we propagate the carry? pi=ai+bi • How to get rid of the ripple? • for each bit in an n-bit adder: ci+1 = f(ai,bi,ci)= gi+ pici • The dependency between ci+1 and cican be eliminated by expandingci(i.e., computecifor each stage in parallel instead of waiting for the carry from the previous stage)
Carry lookahead adder c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = g1+p1g0+p1p0c0 c3 = g2 + p2c2 c3 = g2+p2g1+p2p1g0+p2p1p0c0 c4 = g3 + p3c3 c4 =g3+p3g2+p3p2g1+p3p2p1g0+p3p2p1p0c0
Building bigger adders • Can’t build a 16 bit CLA adder ... (too big) • Solution: use the CLA principle recursively • We could use ripple carry of 4-bit CLA adders
operation op a b res a result b An ALU (arithmetic logic unit) • Let’s build an ALU to support add, and,or instructions • we'll just build a 1 bit ALU, and use 32 of them • Possible Implementation (sum-of-products): • Not easy to decide the “best” way to build something • Don't want too many inputs to a single gate • Don’t want to have to go through too many gates • for our purposes, ease of comprehension is important
What about subtraction (a – b) ? • Two's complement approach: just negate b and add. • How do we negate b? invert b and add 1 through the cin
Adding the NOR instruction De Morgan • How do we get a nor b ? • Can also choose to invert a
Tailoring the ALU to the MIPS • Need to support the set-on-less-than instruction (slt) • remember: slt is an arithmetic instruction • produces a 1 if rs < rt and 0 otherwise • use subtraction: (a-b) < 0 implies a < b • Need to support test for equality (beq $t5, $t6, $t7) • use subtraction: (a-b) = 0 implies a = b
Supporting slt and detecting overflow Can we figure out the idea ? Use this ALU for all other bits Use this ALU for most significant bit
Test for equality Notice control lines:0000 = and0001 = or0010 = add0110 = subtract0111 = slt1100 = nor Note: zero is a 1 when the result is zero!
Conclusion • We can build an ALU to support the MIPS instruction set • key idea: use multiplexor to select the output we want • we can efficiently perform subtraction using two’s complement • we can replicate a 1-bit ALU to produce a 32-bit ALU • Important points about hardware • all of the gates are always working • the speed of a gate is affected by the number of inputs to the gate • the speed of a circuit is affected by the number of gates in series(on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, • Clever changes to organization can improve performance (similar to using better algorithms in software) • We saw this in multiplication, and addition
Conclusion • Real processors use more sophisticated techniques for arithmetic • Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware!