1 / 35

EECS 470

EECS 470. Computer Architecture Lecture 2 Coverage: Chapters 1-2. A Quantitative Approach. Hardware systems performance is generally easy to quantify Machine A is 10% faster than Machine B Of course Machine B’s advertising will show the opposite conclusion

micol
Download Presentation

EECS 470

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 470 Computer Architecture Lecture 2 Coverage: Chapters 1-2

  2. A Quantitative Approach • Hardware systems performance is generally easy to quantify • Machine A is 10% faster than Machine B • Of course Machine B’s advertising will show the opposite conclusion • Example: Pentium 4 vs. AMD Hammer • Many software systems tend to have much more subjective performance evaluations.

  3. Measuring Performance • Use Total Execution Time: • A is 3 times faster than B for programs P1,P2 • Issue: Emphasizes long running programs 1 n  Timei n i=1

  4. Measuring Performance • Weighted Execution Time: • What if P1 is executed far more frequently? n  WeightiTimei Weighti = 1 Arithmetic mean (AM) = i=1 n  i=1

  5. Measuring Performance • Normalized Execution Time: • Compare machine performance to a reference machine and report a ratio. • SPEC ratings measure relative performance to a reference machine.

  6. Example using execution times Conclusion: B is faster than A It is 1001/111 = 9.1 times faster

  7. Averaging Performance Over Benchmarks n 1 • Arithmetic mean (AM) = • Geometric mean (GM) = • Harmonic mean (HM) =  Timei n i = 1 √ n n ∏ Timei i = 1 n n 1  Ratei i = 1

  8. Which is the right Mean? • Arithmetic when dealing with execution time • Harmonic when dealing with rates • flops • MIPS • Hertz • Geometric mean gives an “equi-weighted” average

  9. Use Harmonic Mean with Rates Notice that the total time ordering is preserved in the HM of the rates Rates (mflops) from above table

  10. Normalized Times • Don’t take AM of normalized execution times which one? which one? • GM doesn’t track total execution time – last line

  11. Notes & Benchmarks • AM ≥ GM • GM (Xi) / GM (Yi) = GM (Xi /Yi ) • The GM is unaffected by normalizing – it just doesn’t track execution time • Why does SPEC use it? • SPEC – system performance evaluation cooperative • http://www.specbench.org/ • EEMBC – benchmarks for embedded applications: embedded microporcessor benchmark consortium • http://www.eembc.org/

  12. Amdahl’s Law • Rule of Thumb: Make the common case faster Execution timenew =Execution timeold (1 - Fractionenhanced) +) Fractionenhanced Speedupenhanced (Attack longest running part until it is no longer) repeat

  13. Instruction Set Design • Software Systems: named variables; complex semantics. • Hardware systems: tight timing requirements; small storage structures; simple semantics • Instruction set: the interface between very different software and hardware systems

  14. Design decisions • How much “state” is in the microarchitecture? • Registers; Flags; IP/PC • How is that state accessed/manipulated? • Operand encoding • What commands are supported? • Opcode; opcode encoding

  15. Design Challenges: or why is architecture still relevant? • Clock frequency is increasing • This changes the number of levels of gates that can be completed each cycle so old designs don’t work. • It also tend to increase the ration of time spent on wires (fixed speed of light) • Power • Faster chips are hotter; bigger chips are hotter

  16. Design Challenges (cont) • Design Complexity • More complex designs to fix frequency/power issues leads to increased development/testing costs • Failures (design or transient) can be difficult to understand (and fix) • We seem far less willing to live with hardware errors (e.g. FDIV) than software errors • which are often dealt with through upgrades – that we pay for!)

  17. Techniques for Encoding Operands • Explicit operands: • Includes a field to specify which state data is referenced • Example: register specifier • Implicit operands: • All state data can be inferred from the opcode • Example: function return (CISC-style)

  18. Accumulator • Architectures with one implicit register • Acts as source and/or destination • One other source explicit • Example: C = A + B • Load A // (Acc)umulator  A • Add B // Acc  Acc + B • Store C // C  Acc Ref: “Instruction Level Distributed Processing: Adapting to Shifting Technology”

  19. Stack • Architectures with implicit “stack” • Acts as source(s) and/or destination • Push and Pop operations have 1 explicit operand • Example: C = A + B • Push A // Stack = {A} • Push B // Stack = {A, B} • Add // Stack = {A+B} • Pop C // C  A+B ; Stack = {} Compact encoding; may require more instructions though

  20. Registers • Most general (and common) approach • Small array of storage • Explicit operands (register file index) • Example: C = A + B Register-memory load/store Load R1, A Load R1, A Load R2, B Add R3, R1, B Add R3, R1, R2 Store R3, C Store R3, C

  21. Memory • Big array of storage • More complex ways of indexing than registers • Build addressing modes to support efficient translation of software abstractions • Uses less space in instruction than 32-bit immediate field A[i]; use base (A) + displacement (i) (scaled?) a.ptr; use base (ptr) + displacement (a)

  22. Addressing modes Register Add R4, R3 Immediate Add R4, #3 Base/Displacement Add R4, 100(R1) Register Indirect Add R4, (R1) Indexed Add R4, (R1+R2) Direct Add R4, (1001) Memory Indirect Add R4, @(R3) Autoincrement Add R4, (R2)+

  23. Other Memory Issues What is the size of each element in memory? Byte Half word Word 0x000 0-255 0x000 0 - 65535 0 - ~4B 0x000

  24. Other Memory Issues Big-endian or Little-endian? Store 0x114488FF Points to most significant byte Points to least significant byte 0x000 11 0x000 FF 44 88 88 44 FF 11

  25. Other Memory Issues Non-word loads? ldb R3, (000) 00 00 00 11 0x000 11 44 88 FF

  26. Other Memory Issues Non-word loads? ldb R3, (003) FF FF FF FF 11 44 Sign extended 88 0x003 FF

  27. Other Memory Issues Non-word loads? ldbu R3, (003) 00 00 00 FF 11 44 Zero filled 88 FF 0x003

  28. Other Memory Issues Alignment? Word accesses only address ending in 00 Half-word accesses only ending in 0 Byte accesses any address 11 44 ldw R3, (002) is illegal! 88 0x002 Why is it important to be aligned? How can it be enforced? FF

  29. Techniques for Encoding Operators • Opcode is translated to control signals that • direct data (MUX control) • select operation for ALU • Set read/write selects for register/memory/PC • Tradeoff between how flexible the control is and how compact the opcode encoding. • Microcode – direct control of signals (Improv) • Opcode – compact representation of a set of control signals. • You can make decode easier with careful opcode selection (as done in HW1)

  30. Handling Control Flow • Conditional branches (short range) • Unconditional branches (jumps) • Function calls • Returns • Traps (OS calls and exceptions) • Predicates (conditional retirement)

  31. Encoding branch targets • PC-relative addressing • Makes linking code easier • Indirect addressing • Jumps into shared libraries, virtual functions, case/switch statements • Some unusual modes to simplify target address calculation • (segment offset) or (trap number)

  32. Condition codes • Flags • Implicit: flag(s) specified in opcode (bgt) • Flag(s) set by earlier instructions (compare, add, etc.) • Register • Uses a register; requires explicit specifier • Comparison operation • Two registers with compare operation specified in opcode.

  33. Higher Level Semantics: Functions • Function call semantics • Save PC + 1 instruction for return • Manage parameters • Allocate space on stack • Jump to function • Simple approach: • Use a jump instruction + other instructions • Complex approach: • Build implicit operations into new “call” instruction

  34. Role of the Compiler • Compilers make the complexity of the ISA (from the programmers point of view) less relevant. • Non-orthogonal ISAs are more challenging. • State allocation (register allocation) is better left to compiler heuristics • Complex Semantics lead to more global optimization – easier for a machine to do. People are good at optimizing 10 lines of code. Compilers are good at optimizing 10M lines.

  35. Next time • Compiler optimizations • Interaction between compilers and architectures • Higher level machine codes (Java VM) • Starting Pipelining: Appendix A

More Related