CpE 442 Introduction to Computer Architecture The Role of Performance

CpE 442Introduction to Computer ArchitectureThe Role of Performance Instructor: H. H. Ammar

Overview of Today’s Lecture: The Role of Performance • Review from Last Lecture • Definition and Measures of Performance • Summarizing Performance and Performance Pitfalls

Review: What is "Computer Architecture" ° Co-ordination of levels of abstraction Application Operating System Compiler Instruction Set Architecture Instr. Set Proc. I/O system Digital Design Circuit Design ° Under a set of rapidly changing Forces

Review: Levels of Representation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High Level Language Program lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) Compiler Assembly Language Program Assembler 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Language Program Machine Interpretation Control Signal Specification

Review: Levels of Organization SPARCstation 20 Computer SPARC Processor Memory Devices Control Input Datapath Output

Review: Summary from Last Lecture • All computers consist of five components • Processor: (1) datapath and (2) control • (3) Memory • (4) Input devices and (5) Output devices • Not all “memory” are created equally • Cache: fast (expensive) memory are placed closer to the processor • Main memory: less expensive memory--we can have more • Input and output (I/O) devices has the messiest organization • Wide range of speed: graphics vs. keyboard • Wide range of requirements: speed, standard, cost ... etc. • Least amount of research (so far)

Processor Performance

Metrics of performance Answers per month Operations per second Application Programming Language Compiler (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s ISA Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins

Relating Processor Metrics • CPU execution time = CPU clock cycles/pgm X clock cycle time • or CPU execution time = CPU clock cycles/pgm ÷ clock rate • CPU clock cycles/pgm = Instructions/pgm X CPI the avg. clock cycles per instruction • or CPI = CPU clock cycles/pgm ÷ Instructions/pgm • CPI tells us something about the Instruction Set Architecture, the Implementation of that architecture, and the program measured

CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Aspects of CPU Performance instr. count CPI clock rate Program Compiler Instr. Set Arch. Organization Technology

CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Aspects of CPU Performance instr count CPI clock rate Program X (x) Compiler X (x) Instr. Set. X X Organization X X Technology X

Organizational Trade-offs Application Programming Language Compiler ISA Instruction Mix Datapath CPI Control Function Units Transistors Wires Pins Cycle Time

CPI “Average cycles per instruction” Invest Resources where time is Spent! • CPI = (CPU Time * Clock Rate) / Instruction Count • = Clock Cycles / Instruction Count n CPU time = ClockCycleTime * S CPI * I i i i = 1 n "instruction frequency" CPI = S CPI * F where F = I i i i i i = 1 Instruction Count

Example Base Machine (Reg / Reg) Op Freq(Fi) CPI(i) % Time ALU 50% 1 .5 33% Load 20% 2 .4 27% Store 10% 2 .2 13% Branch 20% 2 .4 27% 1.5 Typical Mix The CPI = 1.5 cycles per instruction Assignment 1: Turn in the solution of the following problems from the text book By Thursday September 4, Chapter 2, Exercises Section, problems number 2.1, 2.2, 2.3, 2.4, 2.10, 2.11, 2.12, 2.13, and 2.15

Assume a program of 1 million instructions, Compare the performance of Base Machine (B) with the above CPI, 1 GHZ clock, and Enhanced Machine (E) with 1.333 GHZ and a one cycle increase for L/S And branch instructions Enhanced Machine (Reg / Reg) Op Freq CPI(i) % Time ALU 50% 1 .5 25% Load 20% 3 .6 30% Store 10% 3 .3 15% Branch20% 3 .6 30% 2.0

Perf. of machine X = 1 / exec. Time of prog on machine X Perf. of E / Perf. of B = exec. Time of B / exec. Time of E = 1.5 * 1 / 2 * 0.75 = 1 Performance of B is similar to that of E, No gain in performance

Marketing Metrics • MIPS = Instruction Count / (Time * 10^6) • = Clock Rate / (CPI * 10^6) • machines with different instruction sets ? • programs with different instruction mixes ? • dynamic frequency of instructions • uncorrelated with performance • MFLOP/S= FP Operations / (Time * 10^6) • machine dependent • often not where time is spent

Example showing why MIPS can failCompare performance with Compilers 1 and 2 for a given program on a given machine Instruction Count in Billion for instruction classes A B CCompiler 1 5 1 1Compiler 2 10 1 1clock cycles 1 2 3Clock cycles using compiler1 = 10 BillionClock cycles using compiler2 = 15 Billionassuming 1GHZ clockCPU Time 1 = 10 secsCPU Time 2 = 15 secsyet the MIPS rating isMIPS 1 = (instr. Count/cpu time in sec x 10^6) = 700MIPS 2 = 800

Why Do Benchmarks? • How we evaluate differences • Different systems • Changes to a single system • Provide a target • Benchmarks should represent large class of important programs • Improving benchmark performance should help many programs • For better or worse, benchmarks shape a field • Good ones accelerate progress • good target for development • Bad benchmarks hurt progress • help real programs v. sell machines/papers? • Inventions that help real programs don’t help benchmark

Programs to Evaluate Processor Performance • (Toy) Benchmarks • 10-100 line • e.g.,: sieve, puzzle, quicksort • Synthetic Benchmarks • attempt to match average frequencies of real workloads • e.g., Whetstone, dhrystone • Kernels • Time critical excerpts Real programs • e.g., gcc, spice

Successful Benchmark: SPEC • EE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988: Sun, MIPS, HP, Apollo, DEC • Create standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O

SPEC first round • First round 1989; 10 programs, single number to summarize performance • One program: 99% of time in single line of code • New front-end compiler could improve dramatically

SPEC second round, SPEC95 • 8 integer benchmarks in C and 10 floating pt benchmarks in Fortran

Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = -------------------- = --------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) = ((1-F) + F/S) X ExTime(without E) Speedup(with E) = ExTime(without E) ÷ ((1-F) + F/S) X ExTime(without E) <= 1/(1-F) speed up is bounded by this factor

CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Performance Evaluation Summary • Time is the measure of computer performance! • Good products created when have: • Good benchmarks • Good ways to summarize performance • If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins • Remember Amdahl’s Law: Speedup is limited by unimproved part of program

CpE 442 Introduction to Computer Architecture The Role of Performance

CpE 442 Introduction to Computer Architecture The Role of Performance

Presentation Transcript

CpE 442 Computer Architecture and Engineering Designing Single Cycle Control

CpE 442 Computer Architecture and Engineering MIPS Instruction Set Architecture

CpE 442 Intro. To Computer Architecure

Introduction To Computer Architecture

CpE 442 Introduction To Computer Architecture Lecture 1

Introduction To Computer Architecture

Introduction to Computer Architecture

CS455/CpE 442 Computer Architecture and Engineering MIPS Instruction Set Architecture

CPE 323 Introduction to Embedded Computer Systems: The MSP430X Architecture

CpE 442 Virtual Memory

CPE 323 Introduction to Embedded Computer Systems: The MSP430 System Architecture

Computer Architecture Chapter 2 The Role of Performance

Introduction to Computer Architecture

Introduction To Computer Architecture

CpE 442 Memory System

CpE 442 Introduction to Computer Architecture The Role of Performance

CpE 442 Virtual Memory

CS455/CpE 442 Computer Architecture and Engineering MIPS Instruction Set Architecture

CpE 442 Memory System

Introduction To Computer Architecture