Chapter 3 Arithmetic for Computers

Chapter 3Arithmetic for Computers

Taxonomy of Computer Information Information Instructions Data Addresses Numeric Non-numeric Fixed-point Floating-point Character (ASCII) Boolean Other Single- precision Double- precision Signed Unsigned (ordinal) Sign-magnitude 2’s complement ELEC 5200/6200 From Patterson/Hennessey Slides

Number Format Considerations • Type of numbers (integer, fraction, real, complex) • Range of values • between smallest and largest values • wider in floating-point formats • Precision of values (max. accuracy) • usually related to number of bits allocated n bits => represent 2n values/levels • Value/weight of least-significant bit • Cost of hardware to store & process numbers (some formats more difficult to add, multiply/divide, etc.) ELEC 5200/6200 From Patterson/Hennessey Slides

Unsigned Integers • Positional number system: an-1an-2 …. a2a1a0 = an-1x2n-1 +an-2x2n-2 + … +a2x22 +a1x21 +a0x20 • Range = [(2n -1) … 0] • Carry out of MSB has weight 2n • Fixed-point fraction: 0.a-1a-2 …. a-n = a-1x2-1 +a-2x2-2 + … +a-nx2-n ELEC 5200/6200 From Patterson/Hennessey Slides

Signed Integers • Sign-magnitude format (n-bit values): A = San-2 …. a2a1a0 (S = sign bit) • Range = [-(2n-1-1) … +(2n-1-1)] • Addition/subtraction difficult (multiply easy) • Redundant representations of 0 • 2’s complement format (n-bit values): -A represented by 2n – A • Range = [-(2n-1) … +(2n-1-1)] • Addition/subtraction easier (multiply harder) • Single representation of 0 ELEC 5200/6200 From Patterson/Hennessey Slides

Computing the 2’s Complement To compute the 2’s complement of A: • Let A = an-1an-2 …. a2a1a0 • 2n - A = 2n -1 + 1 – A = (2n -1) - A + 1 1 1 … 1 1 1 - an-1 an-2 …. a2 a1 a0 + 1 ------------------------------------- a’n-1a’n-2 …. a’2a’1a’0 + 1 (one’s complement + 1) ELEC 5200/6200 From Patterson/Hennessey Slides

2’s Complement Arithmetic • Let (2n-1-1) ≥ A ≥ 0 and (2n-1-1) ≥ B ≥ 0 • Case 1: A + B • (2n-2) ≥ (A + B) ≥ 0 • Since result < 2n, there is no carry out of the MSB • Valid result if (A + B) < 2n-1 • MSB (sign bit) = 0 • Overflow if (A + B) ≥ 2n-1 • MSB (sign bit) = 1 if result ≥ 2n-1 • Carry into MSB ELEC 5200/6200 From Patterson/Hennessey Slides

2’s Complement Arithmetic • Case 2: A - B • Compute by adding: A + (-B) • 2’s complement: A + (2n - B) • -2n-1 < result < 2n-1 (no overflow possible) • If A ≥ B: 2n + (A - B) ≥ 2n • Weight of adder carry output = 2n • Discard carry (2n), keeping (A-B), which is ≥ 0 • If A < B: 2n + (A - B) < 2n • Adder carry output = 0 • Result is 2n - (B - A) = 2’s complement representation of -(B-A) ELEC 5200/6200 From Patterson/Hennessey Slides

2’s Complement Arithmetic • Case 3: -A - B • Compute by adding: (-A) + (-B) • 2’s complement: (2n - A) + (2n - B) = 2n + 2n - (A + B) • Discard carry (2n), making result 2n - (A + B) = 2’s complement representation of -(A + B) • 0 ≥ result ≥ -2n • Overflow if -(A + B) < -2n-1 • MSB (sign bit) = 0 if 2n - (A + B) < 2n-1 • no carry into MSB ELEC 5200/6200 From Patterson/Hennessey Slides

Relational Operators Compute A-B & test ALU “flags” to compare A vs. B ZF = result zero OF = 2’s complement overflow SF = sign bit of result CF = adder carry output ELEC 5200/6200 From Patterson/Hennessey Slides

MIPS Overflow Detection • An exception (interrupt) occurs when overflow detected for add,addi,sub • Control jumps to predefined address for exception • Interrupted address is saved for possible resumption • Details based on software system / language • example: flight control vs. homework assignment • Don't always want to detect overflow— new MIPS instructions: addu, addiu, subu note: addiu still sign-extends! note: sltu, sltiu for unsigned comparisons ELEC 5200/6200 From Patterson/Hennessey Slides

operation a ALU 32 result 32 b 32 Designing the Arithmetic & Logic Unit (ALU) • Provide arithmetic and logical functions as needed by the instruction set • Consider tradeoffs of area vs. performance (Material from Appendix B) ELEC 5200/6200 From Patterson/Hennessey Slides

Different Implementations • Not easy to decide the “best” way to build something • Don't want too many inputs to a single gate (fan in) • Don’t want to have to go through too many gates (delay) • For our purposes, ease of comprehension is important • Let's look at a 1-bit ALU for addition: • How could we build a 1-bit ALU for add, and, and or? • How could we build a 32-bit ALU? cout = a b + a cin + b cin sum = a xor b xor cin ELEC 5200/6200 From Patterson/Hennessey Slides

Building a 32 bit ALU ELEC 5200/6200 From Patterson/Hennessey Slides

What about subtraction (a – b) ? • Two's complement approach: just negate b and add. ELEC 5200/6200 From Patterson/Hennessey Slides

Adding a NOR function • Can also choose to invert a. How do we get “a NOR b” ? A i n v e r t O p e r a t i o n B i n v e r t C a r r y I n a 0 0 1 1 R e s u l t b 0 2 + 1 C a r r y O u t ELEC 5200/6200 From Patterson/Hennessey Slides

Tailoring the ALU to the MIPS • Need to support the set-on-less-than instruction (slt) • remember: slt is an arithmetic instruction • produces a 1 if rs < rt and 0 otherwise • use subtraction: (a-b) < 0 implies a < b • Need to support test for equality (beq $t5, $t6, $t7) • use subtraction: (a-b) = 0 implies a = b ELEC 5200/6200 From Patterson/Hennessey Slides

O p e r a t i o n A i n v e r t B i n v e r t C a r r y I n O p e r a t i o n B i n v e r t C a r r y I n a 0 0 a 0 1 1 1 R e s u l t b R e s u l t 0 2 + b 0 2 + 1 1 L e s s 3 L e s s 3 C a r r y O u t Supporting slt A i n v e r t 0 1 S e t O v e r f l o w O v e r f l o w d e t e c t i o n all other bits Use this ALU for most significant bit ELEC 5200/6200 From Patterson/Hennessey Slides

O p e r a t i o n C a r r y I n C a r r y I n a 0 R e s u l t 0 b 0 A L U 0 L e s s C a r r y O u t a 1 C a r r y I n R e s u l t 1 b 1 A L U 1 L e s s C a r r y O u t a 2 C a r r y I n b 2 A L U 2 L e s s C a r r y O u t C a r r y I n a 3 1 A L U 3 1 L e s Supporting slt B i n v e r t A i n v e r t 0 R e s u l t 2 0 . . . . . . . . . C a r r y I n R e s u l t 3 1 S e t b 3 1 0 s O v e r f l o w ELEC 5200/6200 From Patterson/Hennessey Slides

Test for equality • Notice control lines:0000 = and0001 = or0010 = add0110 = subtract0111 = slt1100 = NOR • Note: zero is a 1 when the result is zero! ELEC 5200/6200 From Patterson/Hennessey Slides

Conclusion • We can build an ALU to support the MIPS instruction set • key idea: use multiplexor to select the output we want • we can efficiently perform subtraction using two’s complement • we can replicate a 1-bit ALU to produce a 32-bit ALU • Important points about hardware • all of the gates are always working • the speed of a gate is affected by the number of inputs to the gate • the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, • Clever changes to organization can improve performance (similar to using better algorithms in software) • We saw this in multiplication, let’s look at addition now ELEC 5200/6200 From Patterson/Hennessey Slides

Problem: ripple carry adder is slow • Is a 32-bit ALU as fast as a 1-bit ALU? • Is there more than one way to do addition? • two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c1 = b0c0 + a0c0 +a0b0 c2 = b1c1 + a1c1 +a1b1 c2 = c3 = b2c2 + a2c2 +a2b2 c3 = c4 = b3c3 + a3c3 +a3b3 c4 = Not feasible! Why? ELEC 5200/6200 From Patterson/Hennessey Slides

One-bit Full-Adder Circuit ci FAi XOR sumi ai XOR AND bi AND OR Ci+1 ELEC 5200/6200 From Patterson/Hennessey Slides

32-bit Ripple-Carry Adder c0 a0 b0 sum0 FA0 sum1 a1 b1 FA1 sum2 a2 b2 FA2 sum31 FA31 a31 b31 ELEC 5200/6200 From Patterson/Hennessey Slides

How Fast is Ripple-Carry Adder? • Longest delay path (critical path) runs from cin to sum31. • Suppose delay of full-adder is 100ps. • Critical path delay = 3,200ps • Clock rate cannot be higher than 1012/3,200 = 312MHz. • Must use more efficient ways to handle carry. ELEC 5200/6200 From Patterson/Hennessey Slides

Fast Adders • In general, any output of a 32-bit adder can be evaluated as a logic expression in terms of all 65 inputs. • Levels of logic in the circuit can be reduced to log2N for N-bit adder. Ripple-carry has N levels. • More gates are needed, about log2N times that of ripple-carry design. • Fastest design is known as carry lookahead adder. ELEC 5200/6200 From Patterson/Hennessey Slides

N-bit Adder Design Options Reference: J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Second Edition, San Francisco, California, 1990. ELEC 5200/6200 From Patterson/Hennessey Slides

Carry-lookahead adder • An approach in-between our two extremes • Motivation: • If we didn't know the value of carry-in, what could we do? • When would we always generate a carry? gi = aibi • When would we propagate the carry? pi = ai + bi • Did we get rid of the ripple?c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = g1 + p1 g0 + p1p0c0 c3 = g2 + p2c2 c3 = … c4 = g3 + p3c3 c4 = … Feasible! Why? ELEC 5200/6200 From Patterson/Hennessey Slides

Use principle to build bigger adders • Can’t build a 16 bit adder this way... (too big) • Could use ripple carry of 4-bit CLA adders • Better: use the CLA principle again! ELEC 5200/6200 From Patterson/Hennessey Slides

a0-a15 16-bit ripple carry adder b0-b15 cin Carry-Select Adder sum0-sum15 a16-a31 16-bit ripple carry adder 0 b16-b31 0 sum16-sum31 Multiplexer a16-a31 16-bit ripple carry adder 1 b16-b31 This is known as carry-select adder 1 ELEC 5200/6200 From Patterson/Hennessey Slides

ALU Summary • We can build an ALU to support MIPS addition • Our focus is on comprehension, not performance • Real processors use more sophisticated techniques for arithmetic • Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware! ELEC 5200/6200 From Patterson/Hennessey Slides

Multiplication • More complicated than addition • accomplished via shifting and addition • More time and more area • Let's look at 3 versions based on a gradeschool algorithm 0010 (multiplicand) __x_1011 (multiplier) • Negative numbers: convert and multiply • there are better techniques, we won’t look at them ELEC 5200/6200 From Patterson/Hennessey Slides

Multiplication: Implementation Datapath Control ELEC 5200/6200 From Patterson/Hennessey Slides

Final Version • Multiplier starts in right half of product What goes here? ELEC 5200/6200 From Patterson/Hennessey Slides

Multiplying Signed Numbers withBoothe’s Algorithm • Consider A x B where A and B are signed integers (2’s complemented format) • Decompose B into the sum B1 + B2 + … + Bn A x B = A x (B1 + B2 + … + Bn) = (A x B1) + (A x B2) + … (A x Bn) • Let each Bi be a single string of 1’s embedded in 0’s: …0011…1100… • Example: 0110010011100 = 0110000000000 + 0000010000000 + 0000000011100 ELEC 5200/6200 From Patterson/Hennessey Slides

Boothe’s Algorithm • Scanning from right to left, bit number u is the first 1 bit of the string and bit v is the first 0 left of the string: v u Bi = 0 … 0 1 … 1 0 … 0 = 0 … 0 1 … 1 1 … 1 (2v – 1) - 0 … 0 0 … 0 1 … 1 (2u – 1) = (2v – 1) - (2u – 1) = 2v – 2u ELEC 5200/6200 From Patterson/Hennessey Slides

Boothe’s Algorithm • Decomposing B: A x B = A x (B1 + B2 + … ) = A x [(2v1 – 2u1) + (2v2 – 2u2) + …] = (A x 2v1)–(A x 2u1)+(A x 2v2)–(A x 2u2) … • A x B can be computed by adding and subtracted shifted values of A: • Scan bits right to left, shifting A once per bit • When the bit string changes from 0 to 1, subtract shifted A from the current product P – (A x 2u) • When the bit string changes from 1 to 0, add shifted A to the current product P + (A x 2v) ELEC 5200/6200 From Patterson/Hennessey Slides

Floating Point Numbers • We need a way to represent a wide range of numbers • numbers with fractions, e.g., 3.1416 • large number: 976,000,000,000,000 = 9.76 × 1014 • small number: 0.0000000000000976 = 9.76 × 10-14 • Representation: • sign, exponent, significand: (–1)sign×significand ×2exponent • more bits for significand gives more accuracy • more bits for exponent increases range ELEC 5200/6200 From Patterson/Hennessey Slides

Scientific Notation • Scientific notation: • 0.525×105 = 5.25×104 = 52.5×103 • 5.25 ×104 is in normalized scientific notation. • position of decimal point fixed • leading digit non-zero • Binary numbers • 5.25 = 101.01 = 1.0101×22 • Binary point • multiplication by 2 moves the point to the left • division by 2 moves the point to the right • Known as floating point format. ELEC 5200/6200 From Patterson/Hennessey Slides

Binary to Decimal Conversion Binary (-1)S (1.b1b2b3b4) × 2E Decimal (-1)S × (1 + b1×2-1 + b2×2-2 + b3×2-3 + b4×2-4) × 2E Example: -1.1100 × 2-2 (binary) = - (1 + 2-1 + 2-2) ×2-2 = - (1 + 0.5 + 0.25)/4 = - 1.75/4 = - 0.4375 (decimal) ELEC 5200/6200 From Patterson/Hennessey Slides

IEEE Std. 754 Floating-Point Format Single-Precision S E: 8-bit Exponent F: 23-bit Fraction bits 23-30 bits 0-22 bit 31 Double-Precision S E: 11-bit Exponent F: 52-bit Fraction + bits 20-30 bits 0-19 bit 31 Continuation of 52-bit Fraction bits 0-31 ELEC 5200/6200 From Patterson/Hennessey Slides

IEEE 754 floating-point standard • Represented value = (–1)sign× (1+F) ×2exponent – bias • Exponent is “biased” (excess-K format) to make sorting easier • bias of 127 for single precision and 1023 for double precision • E values in [1 .. 254] (0 and 255 reserved) • Range = 2-126 to 2+127 (1038 to 10+38) • Significand in sign-magnitude, normalized form • Significand = (1 + F) = 1. b-1b-1b-1…b-23 • Suppress storage of leading 1 • Overflow: Exponent requiring more than 8 bits. Number can be positive or negative. • Underflow: Fraction requiring more than 23 bits. Number can be positive or negative. ELEC 5200/6200 From Patterson/Hennessey Slides

IEEE 754 floating-point standard • Example: • Decimal: -5.75 = - ( 4 + 1 + ½ + ¼ ) • Binary: -101.11 = -1.0111 x 22 • Floating point: exponent = 129 = 10000001 • IEEE single precision: 110000001011100000000000000000000 ELEC 5200/6200 From Patterson/Hennessey Slides

Examples Biased exponent (0-255), bias 127 (01111111) to be subtracted 1.1010001 × 210100 = 0 10010011 10100010000000000000000 = 1.6328125 × 220 -1.1010001 × 210100 = 1 10010011 10100010000000000000000 = -1.6328125 × 220 1.1010001 × 2-10100 = 0 01101011 10100010000000000000000 = 1.6328125 × 2-20 -1.1010001 × 2-10100 = 1 01101011 10100010000000000000000 = -1.6328125 × 2-20 0.5 0.125 0.0078125 0.6328125 ELEC 5200/6200 From Patterson/Hennessey Slides

Numbers in 32-bit Formats • Two’s complement integers • Floating point numbers • Ref: W. Stallings, Computer Organization and Architecture, Sixth Edition, Upper Saddle River, NJ: Prentice-Hall. Expressible numbers -231 0 231-1 Positive underflow Negative underflow Negative Overflow Expressible negative numbers Expressible positive numbers Positive Overflow -2-127 0 2-127 - (2 – 2-23)×2127 (2 – 2-23)×2127 ELEC 5200/6200 From Patterson/Hennessey Slides

IEEE 754 Special Codes Zero • ± 1.0 × 2-127 • Smallest positive number in single-precision IEEE 754 standard. • Interpreted as positive/negative zero. • Exponentless than -127 is positive underflow (regard as zero). S 0000000000000000000000000000000 Infinity S 1111111100000000000000000000000 • ± 1.0 × 2128 • Largest positive number in single-precision IEEE 754 standard. • Interpreted as ± ∞ • If true exponent = 128 and fraction ≠ 0, then the number is greater than ∞. It is called “not a number” or NaN (interpret as ∞). ELEC 5200/6200 From Patterson/Hennessey Slides

Addition and Subtraction • Addition/subtraction of two floating-point numbers: Example: 2 x 103 Align mantissas: 0.2 x 104 + 3 x 104 + 3 x 104 3.2 x 104 • General Case: m1 x 2e1 ± m2 x 2e2 = (m1 ± m2 x 2e2-e1)x 2e1 for e1 > e2 = (m1 x 2e1-e2 ± m2)x 2e2 for e2 > e1 • Shift smaller mantissa right by | e1 - e2| bits to align the mantissas. ELEC 5200/6200 From Patterson/Hennessey Slides

Addition/Subtraction Algorithm 0. Zero check - Change the sign of subtrahend - If either operand is 0, the other is the result 1. Significand alignment: right shift smaller significand until two exponents are identical. 2. Addition: add significands and report exception if overflow occurs. 3. Normalization - Shift significand bits to normalize. - report overflow or underflow if exponent goes out of range. 4. Rounding ELEC 5200/6200 From Patterson/Hennessey Slides

FP Add/Subtract (PH Text Figs. 3.16/17) ELEC 5200/6200 From Patterson/Hennessey Slides

Example • Subtraction: 0.5ten- 0.4375ten • Step 0: Floating point numbers to be added 1.000two×2-1 and -1.110two×2-2 • Step 1: Significand of lesser exponent is shifted right until exponents match -1.110two×2-2 →- 0.111two×2-1 • Step 2: Add significands, 1.000two + (- 0.111two) Result is 0.001two×2-1 ELEC 5200/6200 From Patterson/Hennessey Slides

Chapter 3 Arithmetic for Computers