230 likes | 780 Views
Computer Architecture The Anatomy of Modern Processors. Processor Organization (Part 2) John Morris. EN. EN. EN. EN. OE. OE. OE. 6. 5. Speeding it up. MAR. Main Memory. Observe there are several common operations for each instruction Fetch next instruction Increment PC
E N D
Computer ArchitectureThe Anatomy of Modern Processors Processor Organization (Part 2) John Morris
EN EN EN EN OE OE OE 6 5 Speeding it up MAR MainMemory • Observe there are several common operations for each instruction • Fetch next instruction • Increment PC • See mcode in Tanenbaum for actual examples • Introduce an Instruction Fetch Unit (IFU) • Add an additional adder • Incrementer - slightly simpler than a general purpose adder • Some operations can now be performed in parallel • Execute one instruction • Fetch the next • mcode is shorter • PC = PC + 1 not needed • IFU does it! MDR EN PC IFU MIR B bus C bus H c n z v ALU control ALU 4 Shift bits Shifter Shift control Anatomy of Modern Processors
EN EN EN EN OE OE OE 6 5 Speeding it up MAR MainMemory • Provide 3 data buses • More flexibility for instructions • Doesn’t require a ‘move-only’ cycle to bring an operand into H • Question: • What consequence does the addition of the A bus have on the microcode word width? MDR EN PC IFU MIR B bus A bus C bus H c n z v ALU control ALU 4 Shift bits Shifter Shift control Anatomy of Modern Processors
MAR MainMemory MDR EN EN EN EN OE OE OE EN PC IFU MIR B bus A bus FetchedInstruction ALU Operands C bus H c n z v ALU control ALU 4 Shift bits Shifter Shift control 5 6 ALU Result Speeding it up • Add registers! • Pipelined machine • Fetched instruction • ALU operands • ALU Result • Latency of an instruction? • 4 stage pipeline • Include write back to memory • 4 clock cycles • Throughput • 1 Instruction completes / clock cycle • Up to 4 instructions ‘in flight’ at any time • Long latency instructions reduce this eg Memory fetch Anatomy of Modern Processors
MAR MainMemory MDR EN EN EN EN OE OE OE EN PC IFU MIR B bus A bus FetchedInstruction ALU Operands C bus H c n z v ALU control ALU 4 Shift bits Shifter Shift control 6 5 ALU Result Speeding it up • There are many more things … • SOFTENG 363 covers the most important tricks learnt in 40 years of computer architecture research! Anatomy of Modern Processors
b b b a c c a a c 31 31 31 1 0 0 1 1 0 Computational Elements • How does a circuit compute? • Simple boolean operations egc = a v b are straightforward 32 a 32 32 b c Anatomy of Modern Processors
Computational Elements - Adders • How does a circuit compute? • Something more complex: egc = a + b • Adders are crucial! • Programs - ~25% of instructions • Arithmetic • Array addressing • String indexing • … • Program counter • PC = PC + 1 (logically) • Actually PC = PC+4 in a 32-bit machine • Relative jump • Jump to PC + n instructions Anatomy of Modern Processors
Computational Elements - Adders • Adders • Start by adding two bits • Observe we need a carry from 1 + 1 • Circuit block • What happens to couti? • It is fed to block i+1 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 10 ci ai + couti bi Anatomy of Modern Processors
Computational Elements - Adders • Adder Circuit block • This is known as a Full Adder • A Half Adder doesn’t have carry in + cini (couti-1) sumi ai couti (cini+1) bi ci ai + couti bi Anatomy of Modern Processors
a b cin FA FA FA FA carry c Computational Elements - Adders • 32-bit adder • Note there’s a carry out for the overflow bit • What do we do with carry in? a0 a31 b0 b31 sum0 sum31 • First solution: • Use a half adder! - It doesn’t have one! • Second (usual) solution: • Set it to zero • Why use a more complex full adder when a half adder will do? • Later Anatomy of Modern Processors
Full Adder • Truth Table • Observe that cout|sum read as a binary number counts the number of input bits Anatomy of Modern Processors
cin sum a b cout Full Adder • Logic equations • sum = a xor b xor cin • carry = (a b) (a cin) (b cin) • Implementation Anatomy of Modern Processors
a a0 a31 b cin b0 b31 FA FA FA FA carry sum0 sum31 c Adder - Performance • 32-bit adder • FAi • Cin is Cout of FAi-1 • So FAi can’t produce a result until FAi-1 has settled • Long tpd • tpd((n bits) = n * tpd((full adder) • This is known as a Ripple Carry Adder • Simple, regular, but sloooooooooooow …. Anatomy of Modern Processors
a a0 a31 b cin b0 b31 FA FA FA FA carry sum0 sum31 c Adder - Performance • 32-bit adder • Ripple carry adder has a long propagation delay! • Adders are crucial • Improving adders can make a big difference • 40+ years of intense research • Just in binary arithmetic!! Anatomy of Modern Processors
Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 ‘Standard’ n-bit ripple carry adders n = any suitable value 0 1 0 1 Here we build an 8-bit adder from 4-bit blocks carry sum4-7 Anatomy of Modern Processors
These two blocks ‘speculate’ on the value of cout3 This block adds the 4 low order bits After 4*tpd it will produce a carry out Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 One assumes it will be 0 the other assumes 1 0 1 0 1 carry sum4-7 Anatomy of Modern Processors
This block adds the 4 low order bits After 4*tpd it will produce a carry out Carry Select Adder • After 4*tpd we will have: • sum0-3 (final sum bits) • cout3 (from low order block) • sum04-7 • cout07 (from block assuming 0 cin) • sum14-7 • cout17 (from block assuming 1 cin) a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 0 1 0 1 carry sum4-7 Anatomy of Modern Processors
Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder Cout3 selects correct sum4-7 and carry out sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 0 1 0 1 All 8 bits + carry are available after 4*tpd(FA) + tpd(multiplexor) carry sum4-7 Anatomy of Modern Processors
Carry Select Adder • This scheme can be generalized to any number of bits • Select a suitable block size (eg 4, 8) • Replicate all blocks except the first • One with cin = 0 • One with cin = 1 • Use final cout from preceding block to select correct set of outputs for current block Anatomy of Modern Processors
Fast Adders • Many other fast adder schemes have been proposedeg • Carry-skip • Manchester • Carry-save • Carry Look Ahead • If implementing an adder (eg in programmable logic) • do a little research first! Anatomy of Modern Processors
What about that carry in? • In an ALU, we usually need to do more than just add! • Subtractions are common also • Observe • c = a - b is equivalent to • c = a + (-b) • So we can use an adder for subtractions if we can negate the 2nd operand • Negation in 2’s complement arithmetic? Anatomy of Modern Processors
Adder / Subtractor • Negation in 2’s complement arithmetic? • Rule: • Complement each bit • Add 1 • eg Binary Decimal 0001 1 Complement 1110 Add 1 1111 -1 0110 6 Complement 1001 Add 1 1010 -6 Anatomy of Modern Processors
FA FA Adder / Subtractor • Using an adder • Complement each bit using an inverter • Use the carry in to add 1! a b 0 1 add/ subtract cin FA carry c Anatomy of Modern Processors