520 likes | 680 Views
ECE 369. Chapter 3. operation. a. ALU. 32. result. 32. b. 32. Lets Build a Processor, Introduction to Instruction Set Architecture. First Step Into Your Project !!! How could we build a 1-bit ALU for add, and, or ? Need to support the set-on-less-than instruction (slt)
E N D
ECE 369 Chapter 3
operation a ALU 32 result 32 b 32 Lets Build a Processor, Introduction to Instruction Set Architecture • First Step Into Your Project !!! • How could we build a 1-bit ALU for add, and, or? • Need to support the set-on-less-than instruction (slt) • slt is an arithmetic instruction • produces a 1 if a < b and 0 otherwise • use subtraction: (a-b) < 0 implies a < b • Need to support test for equality (beq $t5, $t6, Label) • use subtraction: (a-b) = 0 implies a = b • How could we build a 32-bit ALU? Must Read Appendix
One-bit adder • Takes three input bits and generates two output bits • Multiple bits can be cascaded cout = a.b + a.cin + b.cin sum = a <xor> b <xor> cin
000 = and001 = or010 = add110 = subtract What about subtraction (a – b) ? • Two's complement approach: just negate b and add. • How do we negate? • A very clever solution: 000 = and001 = or010 = add
Supporting Slt • Can we figure out the idea? 000 = and001 = or010 = add110 = subtract111 = slt
Test for equality • Notice control lines000 = and001 = or010 = add110 = subtract111 = slt • Note: Zero is a 1 if result is zero!
How about “a nor b” 000 = and001 = or010 = add110 = subtract111 = slt
Conclusion • We can build an ALU to support an instruction set • key idea: use multiplexor to select the output we want • we can efficiently perform subtraction using two’s complement • we can replicate a 1-bit ALU to produce a 32-bit ALU • Important points about hardware • all of the gates are always working • speed of a gate is affected by the number of inputs to the gate • speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, • Clever changes to organization can improve performance (similar to using better algorithms in software) • How about my instruction smt (set if more than)???
ALU Summary • We can build an ALU to support addition • Our focus is on comprehension, not performance • Real processors use more sophisticated techniques for arithmetic • Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware!
Multiplication • More complicated than addition • Accomplished via shifting and addition • More time and more area
Division • Even more complicated • Can be accomplished via shifting and addition/subtraction • More time and more area • Negative numbers: Even more difficult • There are better techniques, we won’t look at them
Floating point (a brief look) • We need a way to represent • Numbers with fractions, e.g., 3.1416 • Very small numbers, e.g., 0.000000001 • Very large numbers, e.g., 3.15576 x 109 • Representation: • Sign, exponent, fraction: (–1)sign x fraction x 2exponent • More bits for fraction gives more accuracy • More bits for exponent increases range • IEEE 754 floating point standard: • single precision: 8 bit exponent, 23 bit fraction • double precision: 11 bit exponent, 52 bit fraction
IEEE 754 floating-point standard • 1.f x 2e • 1.s1s2s3s4…. snx2e • Leading “1” bit of significand is implicit • Exponent is “biased” to make sorting easier • All 0s is smallest exponent, all 1s is largest • Bias of 127 for single precision and 1023 for double precision
Single Precision • summary: (–1)sign x (1+significand) x 2(exponent– bias) • Example: • 11/100 = 11/102= 0.11 = 1.1x10-1 • Decimal: -.75 = -3/4 = -3/22 • Binary: -.11 = -1.1 x 2-1 • IEEE single precision: 1 01111110 10000000000000000000000 • exponent-bias=-1 => exponent = 126 = 01111110
Opposite Way - 129 0x2-1+1x2-2=0.25
Floating point addition 1.610x10-1 + 9.999x101 0.01610x101 + 9.999x101 10.015x101 1.0015x102 1.002x102
Floating point multiply • To multiply two numbers • Add the two exponent (remember access 127 notation) • Produce the result sign as exor of two signs • Multiply significand portions • Results will be 1x.xxxxx… or 01.xxxx…. • In the first case shift result right and adjust exponent • Round off the result • This may require another normalization step
Floating point divide • To divide two numbers • Subtract divisor’s exponent from the dividend’s exponent (remember access 127 notation) • Produce the result sign as exor of two signs • Divide dividend’s significand by divisor’s significand portions • Results will be 1.xxxxx… or 0.1xxxx…. • In the second case shift result left and adjust exponent • Round off the result • This may require another normalization step
Floating point complexities • Operations are somewhat more complicated (see text) • In addition to overflow we can have “underflow” • Accuracy can be a big problem • IEEE 754 keeps two extra bits, guard and round • Four rounding modes • Positive divided by zero yields “infinity” • Zero divide by zero yields “not a number” • Other complexities • Implementing the standard can be tricky • Not using the standard can be even worse • See text for description of 80x86 and Pentium bug!
Chapter Three Summary • Computer arithmetic is constrained by limited precision • Read pages 213-215 • Bit patterns have no inherent meaning but standards do exist • two’s complement • IEEE 754 floating point • Operations are somewhat more complicated (see text) • In addition to overflow we can have “underflow” • Implementing the standard can be tricky • Not using the standard can be even worse (3.10) • See text for description of 80x86 and Pentium bug!
Problem: Ripple carry adder is slow! • Is a 32-bit ALU as fast as a 1-bit ALU? • Is there more than one way to do addition? • Can you see the ripple? How could you get rid of it? c1 = a0b0 + a0c0 + b0c0 c2 = a1b1 + a1c1 + b1c1 c2 = c3 = a2b2 + a2c2 + b2c2 c3 = c4 = a3b3 + a3c3 + b3c3 c4 = • Not feasible! Why?
Generate/Propagate 0 0 0 1 0 1 1 1
Carry-look-ahead adder • Motivation: • If we didn't know the value of carry-in, what could we do? • When would we always generate a carry? gi = ai . bi • When would we propagate the carry? pi = ai + bi • Did we get rid of the ripple? c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = g1 + p1g0 + p1p0c0 c3 = g2 + p2c2 c3 = g2 + p2g1 + p2p1g0 + p2p1p0c0 c4 = g3 + p3c3 c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0 • Feasible! Why? c1 = a0b0 + a0c0 + b0c0 c2 = a1b1 + a1c1 + b1c1 c2 = c3 = a2b2 + a2c2 + b2c2 c3 = c4 = a3b3 + a3c3 + b3c3 c4 = a3 a2 a1 a0 b3 b2 b1 b0
A 4-bit carry look-ahead adder • Generate g and p term for each bit • Use g’s, p’s and carry in to generate all C’s • Also use them to generate block G and P • CLA principle can be used recursively
1 1+2 1+2+2 Gate Delay for 16 bit Adder