260 likes | 381 Views
2010 R&E Computer System Education & Research. Lecture 7. Performance. Prof. Taeweon Suh Computer Science Education Korea University. Response Time and Throughput. Response time (Execution time) Time between the start and the completion of a task Important to individual users Throughput
E N D
2010 R&E Computer System Education & Research Lecture 7. Performance Prof. Taeweon Suh Computer Science Education Korea University
Response Time and Throughput • Response time (Execution time) • Time between the start and the completion of a task • Important to individual users • Throughput • the total amount of work done in a given time • Important to data center managers • Need different performance metrics • Embedded computers and PCs, which are more focused on response time • Servers, which are more focused on throughput
A B C D Response Time vs Throughput Example • Laundry Example • Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold • “Washer” takes 30 minutes • “Dryer” takes 40 minutes • “Folder” takes 20 minutes
A B C D Sequential Laundry 6 PM Midnight 7 8 9 11 10 Time 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e r • Response time: • Throughput: 90 mins 0.67 tasks / hr (= 90mins/task) (6 hours for 4 loads)
30 40 40 40 40 20 A B C D Pipelined Laundry: Start work ASAP 6 PM Midnight 7 8 9 11 10 Time T a s k O r d e r • Response time: • Throughput: 90 mins 1.14 tasks / hr (= 52.5 mins/task) (3.5 hours for 4 loads)
30 40 40 40 40 20 A B C D Pipelining Lessons 6 PM 7 8 9 Time • Pipelining doesn’t help latency (response time) of a single task • Pipelining helps throughput of entire workload • Multiple tasks operating simultaneously • We are going to talk in detail about pipelining in chapter 4 • The term project is to implement CPU with pipelining T a s k O r d e r
Relative Performance • To maximize performance, we want to minimize execution time (response time) for a task X 1 performanceX = execution_timeX If X is n times faster than Y, then performanceX execution_timeY = = n performanceY execution_timeX
Relative Performance Example • A computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds, how much faster is A than B? We know that A is n times faster than B if performanceX execution_timeY = = n 15 The performance ratio is performanceY execution_timeX = 1.5 10 So, A is 1.5 times faster than B
Measuring Execution Time • Program execution time (elapsed time, wall-clock time) is measured in seconds per program • Total response time includes all aspects: disk access, memory access, I/O activities, OS overhead • Determines system performance • CPU time • Time CPU spent processing a given job • Does not include time spent waiting for I/O, or running other programs
CPU Clock • Let’s use a different metric to measure performance • Virtually all computers are constructed in sync with a clock • Discrete time intervals are called clock cycles clock cycle 0 clock cycle 1 clock cycle 2 clock cycle 3 clock cycle 4 clock cycle 5 clock cycle 6 • Clock period (T): duration of a clock cycle • e.g. 250ps = 0.25ns = 250×10–12s • Clock frequency (f) : cycles per second (1/T) • e.g.4.0GHz = 4000MHz = 4.0×109Hz
Reminder: Clock Oscillators in Digital Systems • Virtually all digital systems are essentially synchronous to the clock
CPU Time • Express CPU time in terms of clock CPU Time = CPU clock cycles X clock cycle time (T) = CPU clock cycles Clock frequency (f) • If you observe the formula, the performance is improved by • Reducing the number of clock cycles • Increasing clock frequency • Hardware designer must often trade off clock frequency against cycle count
CPU Time Example • Computer A running at 2GHz clock requires 10 second CPU time to run your program • Let’s design a new Computer B • Aim for 6 second CPU time to run the same program • but causes 1.2 × clock cycles, compared to Computer A • How fast should the computer B’s clock be? How many clock cycles computer A needs? CPU clock cycle A = 10 sec X 2GHz = 20G cycles Now, how many clock cycles computer B needs? 1.2 X 20G cycles = 24G cycles Computer B requires 6 seconds to run the program 6 seconds = 24G cycles X T = 24G / f fB = 4GHz
Instruction Count and CPI • The performance equation does not include any reference to the number of instructions needed to run a program • Since computer executes instructions to run programs, the execution time must depend on the number of instructions executed • Execution time is that it equals to the number of instructions executed multiplied by the average time per instruction CPU Time = CPU clock cycles X clock cycle time (T) CPU clock cycles = # instructions X Avg. clock cycles per inst (CPI) CPU Time = # insts X CPI X clock cycle time (T) = # insts X CPI / f
Instruction Count and CPI • #insts • Determined by program, ISA and compiler • CPI • Determined by your CPU design (hardware) CPU Time = # insts X CPI X clock cycle time (T) = # insts X CPI / f
CPI Example • Computer A has a clock cycle time of 250ps and CPI of 2.0 when running a program • Computer B has a cycle time of 500ps and CPI of 1.2 when running the same program • Both computers implement the same ISA • Which is faster, and by how much? CPU Time = # insts X CPI X clock cycle time (T) = # insts X CPI / f What is the execution time to run the program in Computer A? # insts X CPI (2.0) X 250 ps = # insts X 500 ps What is the execution time to run the program in Computer B? # insts X CPI (1.2) X 500ps = # insts X 600 ps So, A is faster! How much? = PerformanceA/PerformanceB = Exe timeB/Exe timeA = 600ps / 500ps = 1.2 Computer A is 20% faster than computer B
CPI in More Detail • If different instructions take different numbers of cycles (assume that we have n different instructions) CPU Time = CPU clock cycles X clock cycle time (T) Weighted average CPI
CPI Example • A compiler writer is trying to decide between two code sequences in green for a computer • Hardware designer supplied the following facts in red • Which code sequence is faster? Sequence 1: • Clock cycles= 2×1 + 1×2 + 2×3 = 10 • Avg. CPI = 10/5 = 2.0 Sequence 2: • Clock cycles= 4×1 + 1×2 + 1×3 = 9 • Avg. CPI = 9/6 = 1.5
Performance Summary CPU Time = # insts X CPI X clock cycle time (T) = # insts X CPI / f • Performance depends on • Algorithm: affects the instruction count • Programming language: affects instruction count, CPI • Compiler: affects instruction count, CPI • Instruction set architecture: affects instruction count, CPI, T
SPEC CPU Benchmark • Programs used to measure performance • Supposedly typical of actual workload • Standard Performance Evaluation Corp (SPEC) • Develops benchmarks for CPU, I/O, Web, … • http://www.spec.org/ • SPEC CPU2006 • Elapsed time to execute a selection of programs • Negligible I/O, so focuses on CPU performance • Normalized relative to a reference machine • CINT2006 (integer) and CFP2006 (floating-point)
Chapter 2 • How programs written in C, for example, are translated into the machine language • We’ll study the machine language (assembly language) of MIPS in details
Some Basics • Kilobyte (KB) – 210 or 1,024 bytes • Megabyte (MB)– 220 or 1,048,576 bytes • Gigabyte (GB) – 230 or 1,073,741,824 bytes • Terabyte (TB) – 240 or 1,099,511,627,776 bytes • Petabyte (PB) – 250 or 1024 terabytes • Exabyte (EB) – 260 or 1024 petabytes