250 likes | 439 Views
COMPUTER ARCHITECTURE. Computer Arithmetic. (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface , 3 rd Ed., Morgan Kaufmann, 2007 ). COURSE CONTENTS. Introduction Instructions Computer Arithmetic Performance
E N D
COMPUTER ARCHITECTURE Computer Arithmetic (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3rd Ed., Morgan Kaufmann, 2007)
COURSE CONTENTS • Introduction • Instructions • Computer Arithmetic • Performance • Processor: Datapath • Processor: Control • Pipelining Techniques • Memory • Input/Output Devices
COMPUTER ARITHMETIC Arithmetic Logic Unit (ALU) Fast Adder
Foundation Knowledge • Decimal, Binary, Octal, & Hexadecimal Numbers • Signed & Unsigned Numbers • 2’s Complement Representation • 2’s Complement Negation, Addition, & Subtraction • Overflow • Sign Extension • ASCII vs Binary • Boolean Algebra • Logic Design • Assembly Language
Numbers • Bits are just bits (no inherent meaning) • Conventions define relationship between bits and numbers • Binary numbers (base 2) • 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001... • decimal: 0 . . . 2n – 1 • Of course it gets more complicated: • Numbers are finite (overflow) • Fractions and real numbers • Negative numbers • E.g., no MIPS subi instruction; addi can add a negative number • How do we represent negative numbers? • I.e., which bit patterns will represent which numbers?
Possible Representations • Three representations • Sign Magnitude: One's Complement Two's Complement 000 = +0 000 = +0 000 = +0 001 = +1 001 = +1 001 = +1 010 = +2 010 = +2 010 = +2 011 = +3 011 = +3 011 = +3 100 = -0 100 = -3 100 = -4 101 = -1 101 = -2 101 = -3 110 = -2 110 = -1 110 = -2 111 = -3 111 = -0 111 = -1 • Issues: balance, number of zeros, ease of operations • Which one is best? Why?
MIPS • 32 bit signed numbers: 0000 0000 0000 0000 0000 0000 0000 0000two = 0ten0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten...1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten • maxint: + 2,147,483,647ten • minint: – 2,147,483,648ten
Two’s Complement Operations • Negating a two’s complement number: invert all bits and add 1 • Remember: “negate” and “invert” are quite different! • Converting n bit numbers into numbers with more than n bits: • MIPS 16 bit immediate gets converted to 32 bits for arithmetic • Copy the most significant bit (the sign bit) into the other bits 0010 -> 0000 0010 1010 -> 1111 1010 • “sign extension” (lbu vs. lb)
Additional MIPS Instructions • Character transferlbu $s1, 100($s2) # $s1 memory [$s2+100] (load byte unsigned) sb $s1, 100($s2) # memory [$s2+100] $s1 (store byte) • Conditions sltu $s2, $s3, $s4 # if ($s3) < ($s4) then $s2 1; # else $s2 0 (set on less than, unsigned numbers) # Note that slt works on 2’ complement numbers • Arithmeticon unsigned numbersaddu $s1, $s2, $s3 # $s1 $s2 + $s3 (no overflow detection)subu $s1, $s2, $s3 # $s1 $s2 - $s3 (no overflow detection) addiu $s1, $s2, 100 # $s1 $s2 + 100 (no overflow detection) • MIPS detects overflow with an exception (interrupt), which is an unscheduled procedure call. MIPS includes a register, called exception program counter (EPC) to contain the address of the instruction that caused the exception mfc0 $s1, $epc # $s1 $epc (move from special registers)
Op=0 rs=0 rt=16 rd=10 shamt=8 funct=0 Additional MIPS Instructions • Shift operationssll $t2, $s0, 8 # $t2 $s0<<8 (shift left by constant) srl $s1, $s2, 10 # $s1 $s2>>10 (shift right by constant) Fill the emptied bits with 0’s • Logical operations and $s1, $s2, $s3 # $s1 $s2 and $s3 (bit-by-bit and) or $s1, $s2, $s3 # $s1 $s2 or $s3 (bit-by-bit or) andi $s1, $s2, 100 # $s1 $s2 and 100 ori $s1, $s2, 100 # $s1 $s2 or 100 sll $t2, $s0, 8
ALU operation A 32 Zero Control Funct Result 000 and A and B Result 32 001 or A or B 010 add A + B Overflow 110 sub A - B B 111 slt 1 if A<B 32 Carryout ALU: Arithmetic Logic Unit • Performs arithmetic (e.g. add) & logical operations (e.g. and) in CPU
C a r r y I n a S u m b a r r y O u t C ALU Building Blocks • 1-bit adder • Gates, multiplexor cout = a b + a cin + b cin sum = a b cin Note: Cin is carryin, cout is carryout
C a r r y I n O p e r a t i o n a 0 C a r r y I n O p e r a t i o n R e s u l t 0 C a r r y I n A L U 0 b 0 C a r r y O u t a 0 a 1 C a r r y I n 1 R e s u l t R e s u l t 1 A L U 1 b 1 C a r r y O u t 2 b a 2 C a r r y I n C a r r y O u t R e s u l t 2 A L U 2 b 2 C a r r y O u t a 3 1 C a r r y I n R e s u l t 3 1 A L U 3 1 b 3 1 A Simple ALU A 1-bit ALU that performs AND, OR, and addition (shown below) Building a 32-bit ALU (shown right)
B i n v e r t O p e r a t i o n C a r r y I n a 0 1 R e s u l t 0 b 2 1 C a r r y O u t ALU: Subtraction • Two's complement approach: just negate b and add. • How do we negate? • By selecting Binvert = 1, and setting CarryIn =1 in the least significant bit of ALU, we get 2’s complement subtraction a - b
ALU: Additional Operations • To support set-on-less-than instruction (slt) • slt is an arithmetic instruction • produces a 1 if rs < rt and 0 otherwise • use subtraction: (a-b) < 0 implies a < b • use a Set & a Less signal to indicate result • To support test for equality (beq) • use subtraction: (a-b) = 0 implies a = b
B i n v e r t O p e r a t i o n C a r r y I n a 0 1 R e s u l t b 0 2 1 L e s s 3 C a r r y O u t B i n v e r t O p e r a t i o n C a r r y I n a 0 1 R e s u l t b 0 2 1 L e s s 3 S e t(sign) O v e r f l o w O v e r f l o w d e t e c t i o n ALU: Additional Operations • The ALU for the most significant bit: Set is used for slt instruction, it is connected to Less of lsb (see 32-bit ALU next slide) Overflow detection needed on msb • A 1-bit ALU that performs AND, OR, add, subtract: Less is used for slt instruction (see 32-bit ALU next slide)
B i n v e r t C a r r y I n O p e r a t i o n a 0 C a r r y I n b 0 R e s u l t 0 A L U 0 L e s s t C a r r y O u C a r r y I n a 1 A L U 1 b 1 R e s u l t 1 L e s s 0 C a r r y O u t C a r r y I n a 2 A L U 2 b 2 R e s u l t 2 L e s s 0 C a r r y O u t C a r r y I n C a r r y I n a 3 1 R e s u l t 3 1 A L U 3 1 S e t(sign) b 3 1 L e s s 0 O v e r f o w l CarryOut A 32-bit ALU • A 32-bit ALU that performs AND, OR, add, & subtract For subtract, set Binvert = 1 and CarryIn =1 (for add or logical operations, both set to 0) Can combine Binvert & CarryIn to Bnegate Set and Less, together with subtraction, can be used for slt
1 bit 3 bits 2 bits B n e g a t e O p e r a t i o n a 0 C a r r y I n R e s u l t 0 b 0 A L U 0 L e s s C a r r y O u t a 1 C a r r y I n R e s u l t 1 b 1 A L U 1 0 L e s s Z e r o C a r r y O u t a 2 C a r r y I n R e s u l t 2 b 2 A L U 2 0 L e s s C a r r y O u t R e s u l t 3 1 a 3 1 C a r r y I n S e t b 3 1 A L U 3 1 (Sign) 0 L e s s O v e r f l o w CarryOut A Final 32-bit ALU • Add a zero detector to test for zero results or equality (e.g. in beq instruction) • Control lines (Operation) (3-bit):000 = and001 = or010 = add110 = subtract111 = slt bit1 & bit0 to multiplexors in ALU bit2 to Bnegate • Note: zero is a 1 when the result is zero!
ALU Design: Summary • Select building blocks: adders, gates • Use multiplexors to select the output we want • Perform subtraction using two’s complement • Replicate a 1-bit ALU to produce a 32-bit ALU --> regularity • Need circuit to detect conditions e.g. zero result, overflow, sign, carry out • Shift instructions: Done outside the ALU by barrel shifter, which can shift from 1 to 31 bits in no more time than it takes to add two 32 bit numbers using carry lookahead adders • Important points about hardware • all of the gates are always working • the speed of a gate is affected by the number of inputs to the gate • the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, • Clever changes to organization can improve performance (similar to using better algorithms in software)
+ + + + Carry Lookahead Adder • Ripple carry adder is just too slow: • The sequential chain reaction is too slow for time-critical hardware • Is a 32-bit ALU as fast as a 1-bit ALU? • Is there more than one way to do addition? • two extremes: ripple carry and sum-of-products • Can you see the ripple? How could you get rid of it?
Carry Lookahead Adder • Carry lookahead adder (CLA): an approach in-between our two extremes • Motivation: • If we didn't know the value of carry-in, what could we do? • When would we always generate a carry? gi = ai bi • When would we propagate the carry? pi = ai + bi • Did we get rid of the ripple? ci+1 = gi + pici c1 = g0 + p0c0 c2 = g1 + p1c1 = g1 + p1g0 + p1p0c0 c3 = g2 + p2c2 = g2 + p2g1 + p2p1g0 + p2p1p0c0 c4 = g3 + p3c3 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0 • Carry lookahead!
C a r r y I n a 0 C a r r y I n b 0 R e s u l t 0 - - 3 a 1 b 1 A L U 0 a 2 p i P 0 b 2 g i G 0 a 3 b 3 C a r r y - l o o k a h e a d u n i t C 1 c i + 1 a 4 C a r r y I n b 4 R e s u l t 4 - - 7 a 5 b 5 A L U 1 a 6 p i + 1 P 1 b 6 g i + 1 G 1 a 7 b 7 C 2 c i + 2 C a r r y I n a 8 b 8 R e s u l t 8 - - 1 1 a 9 b 9 A L U 2 a 1 0 p i + 2 P 2 b 1 0 g i + 2 G 2 a 1 1 b 1 1 C 3 c i + 3 C a r r y I n a 1 2 b 1 2 R e s u l t 1 2 - - 1 5 a 1 3 b 1 3 A L U 3 a 1 4 p i + 3 P 3 g i + 3 b 1 4 G 3 a 1 5 C 4 c i + 4 b 1 5 C a r r y O u t Building Bigger Adders • Can’t build a 16 bit adder using the gi & pi CLA method --> too big • Could use ripple carry of 4-bit CLA adders • Better: use the CLA principle again! (see left figure)
Summary • Review number system • Additional MIPS instructions • The design of an ALU • Carry lookahead adder