1 / 30

Performance Computer Architecture – CS401 Erkay Savas Sabanci University

Performance Computer Architecture – CS401 Erkay Savas Sabanci University. Performance. What is performance? How to measure performance ? Performance metrics Performance evaluation Why some hardware perform better than others for different programs?

dariusj
Download Presentation

Performance Computer Architecture – CS401 Erkay Savas Sabanci University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Computer Architecture – CS401 Erkay Savas Sabanci University Erkay Savas

  2. Performance • What is performance? • How to measure performance? • Performance metrics • Performance evaluation • Why some hardware perform better than others for different programs? • What factors in hardwareare related to system overall performance? • How does the machine's instruction set affect performance? Erkay Savas

  3. Airplane Passenger Capacity Range (miles) Speed (m.p.h) Passenger throughput (passenger x m.p.h) 228750 Boeing 777 375 4630 610 268700 Boeing 747 470 4150 610 393600 Airbus A 3xx 656 8400 600 178200 Concorde 132 4000 1350 79424 Douglas DC-8-50 146 8720 544 Airplane Analogy • Which of these airplanes has the best performance? Erkay Savas

  4. Computer Performance • Response time (latency) • How long does it take for my job to run? • How long does it take to execute a program? • How long must I wait for a database query? • Throughput • How many jobs can the machine run at once? • What is the average execution rate? • How much work is getting done? • If we upgrade a machine with a new processor what do we increase? • If we add a new machine what do we increase? Erkay Savas

  5. Which Time to Measure? • Elapsed Time (Wall clock time, response time) • Counts everything (disk and memory access, I/O, operating system overhead, work on other processes) • Useful but not always good for comparison purposes • CPU (execution) time • The time CPU spends computing for the user task • Not include time spent waiting for I/O, running other programs • user CPU time CPU time spent within the program, • system CPU time CPU time spent in the operating system performing tasks on behalf of the program Erkay Savas

  6. CPU Time • Unix timecommand reflects this breakdown by returning the following when prompted: 90.7u 12.9s 2:39 65% Interpretation: • User CPU time is 90.7 s • System CPU time is 12.9s • Elapsed time is 159 s ( 90.7+12.9) • CPU time is 65% of total elapsed time Erkay Savas

  7. A Definition of Performance • For some program running on machine X PerformanceX = 1/Execution_timeX • The machine X is said to be “ntimes faster” than the machine Yif PerformanceX/PerformanceY = n Execution_timeY/Execution_timeX= n • Example:Machine A runs a program in 10 seconds and machine B runs the same program in 15 seconds, how much faster is A than B? Erkay Savas

  8. Metrics of Performance • “Time to execute a program” is the ultimate metric in determining the performance • However, it is convenient to inspect other metrics as well when we examine the details of a machine. • Computers use a clock that runs at a constant rate and determines when an event takes place in hardware. • These discrete time intervals are called clock cycles(or ticks, clock ticks, clock periods). • Clock rate (frequency) is the inverse of clock period. Erkay Savas

  9. time Start of events often the rising edge of the clock Clock Cycles • Clock “ticks” indicate when to start activities • Instead of reporting execution time in seconds, we often use cycles Erkay Savas

  10. Clock Cycle • cycle time (CT) = time between ticks = seconds per cycle • Cycle Count (CC): the number of clock cycles to execute a program • clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) • A 200 MHz clock has a 1/(200·106) = ? nanosecond cycle time • A 4 GHz clock has a 1/(4· 109) = ? nanosecond cycle time Erkay Savas

  11. CPI • CPIClocks Per Instruction • Number of cycles spent on an instruction on average. • CC = IC  CPI • Hard to compute. • It is useful when comparing the performances of two machines with the same ISA. (Why?) • Example:two machines with the same ISA. For a certain program we have • Machine A:CPI = 2.0 • Machine B:CPI = 1.2 • Which machine is faster? • What if machine A uses 250 ps and machine B 500 ps cycle time Erkay Savas

  12. Improving Performance So, to improve performance • Increase the clock frequency (i.e. decrease the clock period) • Reduce the number of the clock cycles per program (IC  CPI) Erkay Savas

  13. Instruction  Cycle ? • No ! • The number of cycles per instruction depends on the implementations of the instructions in hardware • The number differs for each processor (even with the same ISA) Erkay Savas

  14. The Reason • Operations take different number of cycles • Multiplication takes longer than addition • Floating point operations take longer than integer operations • The access time to a register is much shorter than access to the main memory. Erkay Savas

  15. Simple Formulae for CPU Time • CPU execution time = CPU clock cycles for a program  Clock cycle time (CC  CT) • CPU execution time = CPU clock cycles for a program/Clock rate • We can writeCPU clock cycles for a program =IC  CPI • ThenCPU execution time = (IC  CPI)/Clock rate Erkay Savas

  16. Example • Computer A of 800 MHz • It runs our favorite program in 15 s • Our goal • Design computer B with the same ISA • It will run the same program in 8 s. • We will use a new technology • can increase the clock rate; • however, it will also increase CPI by 1.25. • What clock rate should we aim to use? Erkay Savas

  17. Performance • Performance is determined by execution time (CPU time) • We have also other indicators • # of cycles to execute program • # of instructions in program (IC) • # of cycles per second • average # of cycles per instruction (CPI) • average # of instructions per second • Common pitfall: thinking one of the variables is indicative of performance when it really isn’t. Erkay Savas

  18. Number of Instructions Example • A compiler designer has the following two alternatives to generate a certain piece of code with instructions A(1 cycle) , B (2 cycles), and C(3 cycles): • 2106 of A, 106 of B, and 2106 of C (IC = 5106) • 4106 of A, 106 of B, and 106 of C (IC = 6106) • Which code sequence is faster? Erkay Savas

  19. MIPS • Millions Instructions Per Second = MIPS = IC/(Execution_time  106) MIPS = IC/(#of clocks  cycle time  106) MIPS = (IC  clock rate)/(IC  CPI  106) MIPS = clock rate/(CPI  106) • A faster machine has a higher MIPS Execution_time = IC/(MIPS  106) Erkay Savas

  20. A MIPS Example • A computer with 500 MHz clock • Three different classes of instructions: • A (1 cycle), B (2 cycles), C (3 cycles) • Two compilers used to produce code for a large piece of software. • Compiler 1: • 5 billion A, 1 billion B, and 1 billion C instructions. • Compiler 2: • 10 billion A, 1 billion B, and 1 billion C instructions. • Which sequence will be faster according to execution time? • Which sequence will be faster according to MIPS? Erkay Savas

  21. Problems of MIPS • MIPS specifies instruction execution rate • MIPS does not take into account the capabilities of the instructions • Thus, it is impossible to compare computers with different ISA using MIPS. • MIPS is not constant, even on a single machine, depends on the application. • As we saw in the previous example, MIPS can vary inversely with performance. Erkay Savas

  22. CPI example • CPI • Machine A: CPI = 10/7 = 1.43 • Machine B: CPI = 15/12 = 1.25 • CPU time • CPU time = (IC  CPI) / clock rate • Let us assume both machines use 200 MHz clock Erkay Savas

  23. Overview • A given program will require • Some number of instructions • Some number of clock cycles • Some number of seconds • Vocabulary • Cycle time: (micro or nano) seconds per cycle • Clock rate (frequency): cycles per second • CPI: clock per instruction • MIPS: millions of instruction per second • MFLOPS: millions of floating point operations per second Erkay Savas

  24. Performance • Performance is ultimately determined by execution time • Is any of the following metrics good to measure performance by itself? Why? • # of cycles to execute a program • # of instructions in a program • # of cycles per second • Average # of cycles per instruction • Average # number of instructions per second Erkay Savas

  25. Question • Assuming two machines have the same ISA, which of the following quantities are identical? • Clock rate • CPI • Execution time • # of instructions • MIPS Erkay Savas

  26. HW or SW component Affects what? How? Program Performance Algorithm IC, possibly CPI Programming Language IC, CPI Compiler IC, CPI ISA IC, clock rate, CPI Erkay Savas

  27. Benchmarks • Programs specifically chosen to measure performance • must reflect typical workload of the user • Benchmark types • Real applications • Small benchmarks • Benchmark suites • Synthetic benchmarks Erkay Savas

  28. Real Applications • Workload: Set of programs a typical user runs day in and day out. • To use these real applications for metrics is a direct way of comparing the execution time of the workload on two machines. • Using real applications for metrics has certain restrictions: • They are usually big • Takes time to port to different machines • Takes considerable time to execute • Hard to observe the outcome of a certain improvement technique Erkay Savas

  29. Comparing & Summarizing Performance • A is 100 times faster than B for program 1 • B is 10 times faster than A for program 2 • For total performance, arithmetic mean is used: Erkay Savas

  30. Arithmetic Mean • If each program, in the workload, do not run equal times, then we have to use weighted arithmetic mean • Suppose that the program 1 runs 10 times as often as the program 2. Which machine is faster? Erkay Savas

More Related