2.06k likes | 3.38k Views
14. Course Review. Kai Bu kaibu@zju.edu.cn http://list.zju.edu.cn/kaibu/comparch2018. THANK YOU. Email LinkedIn Twitter Weibo... Keep in touch:). Chapters 1&5 Appendices A-D. Disclaimer: ) Exam questions may be beyond the coverage of review slides. Lectures 02-03.
E N D
14 Course Review Kai Bu kaibu@zju.edu.cn http://list.zju.edu.cn/kaibu/comparch2018
EmailLinkedInTwitterWeibo... Keep in touch:)
Chapters 1&5Appendices A-D Disclaimer:) Exam questions may be beyond the coverage of review slides.
Lectures 02-03 Fundamentals of Computer Design
Classes of Parallel Architectures according to the parallelism in the instruction and data streams called for by the instructions: SISD, SIMD, MISD, MIMD
Instruction Set Architecture ISA • actual programmer-visible instruction set • the boundary between software and hardware
ISA: Class • Most are general-purpose register architectures with operands of either registers or memory locations • Two popular versions register-memory ISA: e.g., 80x86 many instructions can access memory load-store ISA: e.g., ARM, MIPS only load or store instructions can access memory
ISA: Memory Addressing • Byte addressing supports accessing individual bytes of data rather than only larger units called words • Aligned address object width: s bytes address: A aligned if A mod s = 0
ISA: Addressing Modes Specify the address of a memory object • Register Add R2, R1; R2<-R2+R1 • Immediate Add R2, #3; R2<-R2+3 • Displacement Add R2, 100(R1); R2<-R2+M[100+R1]
Measuring Performance • Execution time the time between the start and the completion of an event • Throughput the total amount of work done in a given time
Measuring Performance • Computer X and Computer Y • X is n times faster than Y
Quantitative Principles • Parallelism • Locality temporal locality: recently accessed items are likely to be accessed in the near future; spatial locality: items whose addresses are near one another tend to be referenced close together in time
Quantitative Principles • Amdahl’s Law
Quantitative Principles • Amdahl’s Law: two factors 1. Fractionenhanced: e.g., 20/60 if 20 seconds out of a 60-second program to enhance 2. Speedupenhanced: e.g., 5/2 if enhanced to 2 seconds while originally 5 seconds
Quantitative Principles • The Processor Performance Equation
ICi: the number of times instruction i is executed in a program CPIi: the average number of clocks per instruction for instruction i
Lecture 04 Instruction Set Principles
ISA Classification • Classification Basis the type of internal storage: stack accumulator register • ISA Classes: stack architecture accumulator architecture general-purpose register architecture (GPR)
ISA Classes:Stack Architecture • implicit operands on the Top Of the Stack • C = A + B Push A Push B Add Pop C First operand removed from stack Second op replaced by the result memory
ISA Classes:Accumulator Architecture • one implicit operand: the accumulator one explicit operand: mem location • C = A + B Load A Add B Store C accumulator is both an implicit input operand and a result memory
ISA Classes:General-Purpose Register Arch • Only explicit operands registers memory locations • Operand access: direct memory access loaded into temporary storage first
ISA Classes:General-Purpose Register Arch Two Classes: • register-memory architecture any instruction can access memory • load-store architecture only load and store instructions can access memory
ISA Classes:General-Purpose Register Arch Two Classes: • register-memory architecture any instruction can access mem • C = A + B Load R1, A Add R3, R1, B Store R3, C
ISA Classes:General-Purpose Register Arch Two Classes: • load-store architecture only load and store instructions can access memory • C = A + B Load R1, A Load R2, B Add R3, R1, R2 Store R3, C
Addressing Modes • How instructions specify addresses of objects to access • Types constant register memory location – effective address
Lectures 05-07 Pipelining
Pipelining start executing one instruction before completing the previous one
Pipelined Laundry 3.5 Hours Time Observations • No speed up for individual task; e.g., A still takes 30+40+20=90 • But speed up for average task execution time; e.g., 3.5*60/4=52.5 < 30+40+20=90 30 40 40 40 40 20 A Task Order B C D
MIPS Instruction • at most 5 clock cycles per instruction • IF ID EX MEM WB
MIPS Instruction IF ID EX MEM WB IR ← Mem[PC]; NPC ← PC + 4;
MIPS Instruction IF ID EX MEM WB A ← Regs[rs]; B ← Regs[rt]; Imm ← sign-extended immediate field of IR (lower 16 bits)
MIPS Instruction IF ID EX MEM WB ALUOutput ← A + Imm; ALUOutput ← A func B; ALUOutput ← A op Imm; ALUOutput ← NPC + (Imm<<2); Cond ← (A == 0);
MIPS Instruction IF ID EX MEM WB LMD ← Mem[ALUOutput]; Mem[ALUOutput] ← B; if (cond) PC ← ALUOutput;
MIPS Instruction IF ID EX MEM WB Regs[rd] ← ALUOutput; Regs[rt] ← ALUOutput; Regs[rt] ← LMD;
MIPS Instruction Demo • Prof. Gurpur Prabhu, Iowa State Univ http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/DLXimplem.html • Load, Store • Register-register ALU • Register-immediate ALU • Branch