COMPUTER ARCHITECTURE

COMPUTER ARCHITECTURE Computer Arithmetic (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3rd Ed., Morgan Kaufmann, 2007)

COURSE CONTENTS • Introduction • Instructions • Computer Arithmetic • Performance • Processor: Datapath • Processor: Control • Pipelining Techniques • Memory • Input/Output Devices

COMPUTER ARITHMETIC Arithmetic Logic Unit (ALU) Fast Adder

Foundation Knowledge • Decimal, Binary, Octal, & Hexadecimal Numbers • Signed & Unsigned Numbers • 2’s Complement Representation • 2’s Complement Negation, Addition, & Subtraction • Overflow • Sign Extension • ASCII vs Binary • Boolean Algebra • Logic Design • Assembly Language

Numbers • Bits are just bits (no inherent meaning) • Conventions define relationship between bits and numbers • Binary numbers (base 2) • 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001... • decimal: 0 . . . 2n – 1 • Of course it gets more complicated: • Numbers are finite (overflow) • Fractions and real numbers • Negative numbers • E.g., no MIPS subi instruction; addi can add a negative number • How do we represent negative numbers? • I.e., which bit patterns will represent which numbers?

Possible Representations • Three representations • Sign Magnitude: One's Complement Two's Complement 000 = +0 000 = +0 000 = +0 001 = +1 001 = +1 001 = +1 010 = +2 010 = +2 010 = +2 011 = +3 011 = +3 011 = +3 100 = -0 100 = -3 100 = -4 101 = -1 101 = -2 101 = -3 110 = -2 110 = -1 110 = -2 111 = -3 111 = -0 111 = -1 • Issues: balance, number of zeros, ease of operations • Which one is best? Why?

MIPS • 32 bit signed numbers: 0000 0000 0000 0000 0000 0000 0000 0000two = 0ten0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten...1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten • maxint: + 2,147,483,647ten • minint: – 2,147,483,648ten

Two’s Complement Operations • Negating a two’s complement number: invert all bits and add 1 • Remember: “negate” and “invert” are quite different! • Converting n bit numbers into numbers with more than n bits: • MIPS 16 bit immediate gets converted to 32 bits for arithmetic • Copy the most significant bit (the sign bit) into the other bits 0010 -> 0000 0010 1010 -> 1111 1010 • “sign extension” (lbu vs. lb)

Additional MIPS Instructions • Character transferlbu $s1, 100($s2) # $s1  memory [$s2+100] (load byte unsigned) sb $s1, 100($s2) # memory [$s2+100]  $s1 (store byte) • Conditions sltu $s2, $s3, $s4 # if ($s3) < ($s4) then $s2  1; # else $s2  0 (set on less than, unsigned numbers) # Note that slt works on 2’ complement numbers • Arithmeticon unsigned numbersaddu $s1, $s2, $s3 # $s1  $s2 + $s3 (no overflow detection)subu $s1, $s2, $s3 # $s1  $s2 - $s3 (no overflow detection) addiu $s1, $s2, 100 # $s1  $s2 + 100 (no overflow detection) • MIPS detects overflow with an exception (interrupt), which is an unscheduled procedure call. MIPS includes a register, called exception program counter (EPC) to contain the address of the instruction that caused the exception mfc0 $s1, $epc # $s1  $epc (move from special registers)

Op=0 rs=0 rt=16 rd=10 shamt=8 funct=0 Additional MIPS Instructions • Shift operationssll $t2, $s0, 8 # $t2  $s0<<8 (shift left by constant) srl $s1, $s2, 10 # $s1  $s2>>10 (shift right by constant) Fill the emptied bits with 0’s • Logical operations and $s1, $s2, $s3 # $s1  $s2 and $s3 (bit-by-bit and) or $s1, $s2, $s3 # $s1  $s2 or $s3 (bit-by-bit or) andi $s1, $s2, 100 # $s1  $s2 and 100 ori $s1, $s2, 100 # $s1  $s2 or 100 sll $t2, $s0, 8

ALU operation A 32 Zero Control Funct Result 000 and A and B Result 32 001 or A or B 010 add A + B Overflow 110 sub A - B B 111 slt 1 if A<B 32 Carryout ALU: Arithmetic Logic Unit • Performs arithmetic (e.g. add) & logical operations (e.g. and) in CPU

C a r r y I n a S u m b a r r y O u t C ALU Building Blocks • 1-bit adder • Gates, multiplexor cout = a b + a cin + b cin sum = a  b  cin Note: Cin is carryin, cout is carryout

C a r r y I n O p e r a t i o n a 0 C a r r y I n O p e r a t i o n R e s u l t 0 C a r r y I n A L U 0 b 0 C a r r y O u t a 0 a 1 C a r r y I n 1 R e s u l t R e s u l t 1 A L U 1 b 1 C a r r y O u t 2 b a 2 C a r r y I n C a r r y O u t R e s u l t 2 A L U 2 b 2 C a r r y O u t a 3 1 C a r r y I n R e s u l t 3 1 A L U 3 1 b 3 1 A Simple ALU A 1-bit ALU that performs AND, OR, and addition (shown below) Building a 32-bit ALU (shown right)

B i n v e r t O p e r a t i o n C a r r y I n a 0 1 R e s u l t 0 b 2 1 C a r r y O u t ALU: Subtraction • Two's complement approach: just negate b and add. • How do we negate? • By selecting Binvert = 1, and setting CarryIn =1 in the least significant bit of ALU, we get 2’s complement subtraction a - b

ALU: Additional Operations • To support set-on-less-than instruction (slt) • slt is an arithmetic instruction • produces a 1 if rs < rt and 0 otherwise • use subtraction: (a-b) < 0 implies a < b • use a Set & a Less signal to indicate result • To support test for equality (beq) • use subtraction: (a-b) = 0 implies a = b

B i n v e r t O p e r a t i o n C a r r y I n a 0 1 R e s u l t b 0 2 1 L e s s 3 C a r r y O u t B i n v e r t O p e r a t i o n C a r r y I n a 0 1 R e s u l t b 0 2 1 L e s s 3 S e t(sign) O v e r f l o w O v e r f l o w d e t e c t i o n ALU: Additional Operations • The ALU for the most significant bit: Set is used for slt instruction, it is connected to Less of lsb (see 32-bit ALU next slide) Overflow detection needed on msb • A 1-bit ALU that performs AND, OR, add, subtract: Less is used for slt instruction (see 32-bit ALU next slide)

B i n v e r t C a r r y I n O p e r a t i o n a 0 C a r r y I n b 0 R e s u l t 0 A L U 0 L e s s t C a r r y O u C a r r y I n a 1 A L U 1 b 1 R e s u l t 1 L e s s 0 C a r r y O u t C a r r y I n a 2 A L U 2 b 2 R e s u l t 2 L e s s 0 C a r r y O u t C a r r y I n C a r r y I n a 3 1 R e s u l t 3 1 A L U 3 1 S e t(sign) b 3 1 L e s s 0 O v e r f o w l CarryOut A 32-bit ALU • A 32-bit ALU that performs AND, OR, add, & subtract For subtract, set Binvert = 1 and CarryIn =1 (for add or logical operations, both set to 0) Can combine Binvert & CarryIn to Bnegate Set and Less, together with subtraction, can be used for slt

1 bit 3 bits 2 bits B n e g a t e O p e r a t i o n a 0 C a r r y I n R e s u l t 0 b 0 A L U 0 L e s s C a r r y O u t a 1 C a r r y I n R e s u l t 1 b 1 A L U 1 0 L e s s Z e r o C a r r y O u t a 2 C a r r y I n R e s u l t 2 b 2 A L U 2 0 L e s s C a r r y O u t R e s u l t 3 1 a 3 1 C a r r y I n S e t b 3 1 A L U 3 1 (Sign) 0 L e s s O v e r f l o w CarryOut A Final 32-bit ALU • Add a zero detector to test for zero results or equality (e.g. in beq instruction) • Control lines (Operation) (3-bit):000 = and001 = or010 = add110 = subtract111 = slt bit1 & bit0 to multiplexors in ALU bit2 to Bnegate • Note: zero is a 1 when the result is zero!

ALU Design: Summary • Select building blocks: adders, gates • Use multiplexors to select the output we want • Perform subtraction using two’s complement • Replicate a 1-bit ALU to produce a 32-bit ALU --> regularity • Need circuit to detect conditions e.g. zero result, overflow, sign, carry out • Shift instructions: Done outside the ALU by barrel shifter, which can shift from 1 to 31 bits in no more time than it takes to add two 32 bit numbers using carry lookahead adders • Important points about hardware • all of the gates are always working • the speed of a gate is affected by the number of inputs to the gate • the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, • Clever changes to organization can improve performance (similar to using better algorithms in software)

+ + + + Carry Lookahead Adder • Ripple carry adder is just too slow: • The sequential chain reaction is too slow for time-critical hardware • Is a 32-bit ALU as fast as a 1-bit ALU? • Is there more than one way to do addition? • two extremes: ripple carry and sum-of-products • Can you see the ripple? How could you get rid of it?

Carry Lookahead Adder • Carry lookahead adder (CLA): an approach in-between our two extremes • Motivation: • If we didn't know the value of carry-in, what could we do? • When would we always generate a carry? gi = ai bi • When would we propagate the carry? pi = ai + bi • Did we get rid of the ripple? ci+1 = gi + pici c1 = g0 + p0c0 c2 = g1 + p1c1 = g1 + p1g0 + p1p0c0 c3 = g2 + p2c2 = g2 + p2g1 + p2p1g0 + p2p1p0c0 c4 = g3 + p3c3 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0 • Carry lookahead!

C a r r y I n a 0 C a r r y I n b 0 R e s u l t 0 - - 3 a 1 b 1 A L U 0 a 2 p i P 0 b 2 g i G 0 a 3 b 3 C a r r y - l o o k a h e a d u n i t C 1 c i + 1 a 4 C a r r y I n b 4 R e s u l t 4 - - 7 a 5 b 5 A L U 1 a 6 p i + 1 P 1 b 6 g i + 1 G 1 a 7 b 7 C 2 c i + 2 C a r r y I n a 8 b 8 R e s u l t 8 - - 1 1 a 9 b 9 A L U 2 a 1 0 p i + 2 P 2 b 1 0 g i + 2 G 2 a 1 1 b 1 1 C 3 c i + 3 C a r r y I n a 1 2 b 1 2 R e s u l t 1 2 - - 1 5 a 1 3 b 1 3 A L U 3 a 1 4 p i + 3 P 3 g i + 3 b 1 4 G 3 a 1 5 C 4 c i + 4 b 1 5 C a r r y O u t Building Bigger Adders • Can’t build a 16 bit adder using the gi & pi CLA method --> too big • Could use ripple carry of 4-bit CLA adders • Better: use the CLA principle again! (see left figure)

Summary • Review number system • Additional MIPS instructions • The design of an ALU • Carry lookahead adder

COMPUTER ARCHITECTURE

COMPUTER ARCHITECTURE

Presentation Transcript

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture