410 likes | 800 Views
Evaluating Computer Performance. Edward L. Bosworth, Ph.D. Computer Science Department Columbus State University. The IBM Stretch. The IBM 7030, called the “Stretch” was intended to be 100 times as fast as existing IBM models, such as the 704 and 705.
E N D
Evaluating Computer Performance Edward L. Bosworth, Ph.D.Computer Science DepartmentColumbus State University
The IBM Stretch • The IBM 7030, called the “Stretch” was intended to be 100 times as fast as existing IBM models, such as the 704 and 705. • Question: How was the performance to be measured?
Typical Comments • 1. I want a better computer. • 2. I want a faster computer. • 3. I want a computer or network of computers that does more work. • 4. I have the latest game in “World of Warcraft” and want a computer that can play it. • QUESTION: What does “better” mean? • What does “faster” really mean, beyond the obvious • What does it mean for a computer to do more work?
The Need for Performance • What applications require such computational power? • Weather modeling. We would like to have weather predictions that are accurateup to two weeks after the prediction. This requires a great deal of computationalpower and data storage. It was not until the 1990’s that the models had the Appalachian Mountains. • We would like to model the flight characteristics of an airplane design before weactually build it. We have a set of equations for the flow of air over the wings; at present these would take tens of years to solve. • Some AI applications, such as the IBM Watson.
The Job Mix • What is our speed measure: the time to run one job or the number of jobs run per unit time? • The World of Warcraft example is a good illustration of a job mix. Here the job mix is simple; run this game. • Most computers run a mix of jobs. To assess the suitability of a computer for such an environment, one needs a proper “job mix”, that represent the computing need of one’s organization. • These days, few organizations have the time to specify a good job mix. The more common option is to use the results of commonly available benchmarks, which are job mixes tailored to common applications.
What Is Performance? • In many applications, especially the old “batch mode” computing, the measure was the number of jobs per unit time. The more user jobs that could be processed; the better. • For a single computer running spreadsheets, the speed might be measured in the number of calculations per second. • For computers that support process monitoring, the requirement is that the computer correctly assesses the process and takes corrective action (possibly to include warning a human operator) within the shortest possible time.
Two “Big and Fast” Computer Types • Here we mention two classes of large computer systems and the performance measures appropriate for each class. • A supercomputer, such as the Cray XT5, is designed to work one big and very complex problem at a time. Its performance measure: time to solve each problem. • An enterprise computer, such as the IBM z/9, is designed to process a very large number of simple transactions. Its performance measure: the number of transactions per second.
Hard Real Time • Some systems, called “hard real time”, are those in which there is a fixed time interval during which the computer must produce and answer or take corrective action. • As an example, consider a process monitoring computer with a required response time of 15 seconds. There are two performance measures of interest. • 1. Can it respond within the required 15 seconds? If not, it cannot be used. • 2. How many processes can be monitored while guaranteeing the required 15 secondresponse time for each process being monitored. The more, the better. • A fire alarm with a two-day response time is unusable.
Statistics: The Average and Median • The problem starts with a list of N numbers (A1, A2, …, AN), with N 2. • The average is computed by the (A1 + A2 + + AN) / N. • The median is the “value in the middle”. Half of the values are larger and half are smaller. For a small even number of values, there might be 2 candidate medians. • The average and median are often close, but may be wildly different. Consider Bill Gate’s high school class. The average income is above one million dollars, thanks to Bill and another Microsoft founder. The median is about $40,000.
The Weighted Average • In certain averages, one might want to pay more attention to some values than others.For example, in assessing an instruction mix, one might want to give a weight to each instruction that corresponds to its percentage in the mix. • Each of our numbers (A1, A2, …, AN), with N 2, has an associated weight.So we have (A1, A2, …, AN) and (W1, W2, …, WN). The weighted average is given by the formula (W1A1 + W2A2 …+ WNAN) / (W1 + W2 +…+ WN). • NOTE: If all the weights are equal, this value becomes the arithmetic mean.
Other Averages • Geometric Mean • The geometric mean is the Nth root of the product (A1A2…AN)1/N. It is generally applied only to positive numbers, as we are considering here. • Some of the SPEC benchmarks (discussed later) report the geometric mean. • Harmonic Mean • The harmonic mean is N / ( (1/A1) + (1 / A2) + … + ( 1 / AN) ) • This is more useful for averaging rates or speeds. As an example, suppose that you drive at 40 miles per hour for half the distance and 60 miles per hour for the other half.Your average speed is 48 miles per hour.
More on the Harmonic Mean • Are we averaging by time or distance driven? • By distanceDrive 300 miles at 40 mph. Time is 7.5 hours.Drive 300 miles at 60 mph. Time is 5.0 hours. • You have covered 600 miles in 12.5 hours, for an average speed of 48 mph. • But: Drive 1 hour at 40 mph. You cover 40 miles. Drive 1 hour at 60 mph. You cover 60 miles. That is 100 miles in 2 hours; 50 mph.
Performance & Execution Time • Whatever it is, performance is inversely related to execution time. The longer the execution time, the less the performance. • Again, we assess a computer’s performance by measuring the time to execute either a single program or a mix of computer programs. • Wall-clock time is easiest to measure, but may include time not spent on the target program or job mix. • CPU execution time is the time the CPU actually spends executing the target program. It is harder to measure.
MIPS, MFLOPS, etc. • Here are some commonly used performance measures. • The first is MIPS (Million Instructions Per Second). • Another measure is the FLOPS sequence, commonly used to specify the performance of supercomputers, which tend to use floating–point math fairly heavily. The sequence is: • MFLOPSMillion Floating Point Operations Per Second • GFLOPS Billion Floating Point Operations Per Second(The term “giga” is the standard prefix for 109.) • TFLOPS Trillion Floating Point Operations Per Second(The term “tera” is the standard prefix for 1012.) • Note that these measures are for test programs. The actual sustained performance will be somewhat lower.
MIPS as a Performance Measure • The term “MIPS” had its origin in the marketing departments of IBM and DEC, to sell the IBM 370/158 and VAX–11/780. • One wag has suggested that the term “MIPS” stands for “Meaningless Indicator of Performance for Salesmen”. • A more significant reason for the decline in the popularity of the term “MIPS” is the fact that it just measures the number of instructions executed and not what these do.
MIPS vs. Work Done • Consider the high-level language instruction A[K++] = B. How many assembly language instructions are used to realize this one line? • A VAX-11/780 would require 1 instruction. • The MIPS would require at least two.A[K] = BK = K + 1 • We want to measure the time to solve the problem, not just some instruction count.
Understanding Performance • Algorithm • Determines number of operations executed • Programming language, compiler, architecture • Determine number of machine instructions executed per operation • Processor and memory system • Determine how fast instructions are executed • I/O system (including OS) • Determines how fast I/O operations are executed Chapter 1 — Computer Abstractions and Technology — 17
Defining Performance §1.4 Performance • Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology — 18
Response Time and Throughput • Response time • How long it takes to do a task • Throughput • Total work done per unit time • e.g., tasks/transactions/… per hour • How are response time and throughput affected by • Replacing the processor with a faster version? • Adding more processors? • We’ll focus on response time for now… Chapter 1 — Computer Abstractions and Technology — 19
CPU Clocking • Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transferand computation Update state • Clock period: duration of a clock cycle • e.g., 250ps = 0.25ns = 250×10–12s • Clock frequency (rate): cycles per second • e.g., 4.0GHz = 4000MHz = 4.0×109Hz Chapter 1 — Computer Abstractions and Technology — 20
CPU Time • Performance improved by • Reducing number of clock cycles • Increasing clock rate • Hardware designer must often trade off clock rate against cycle count Chapter 1 — Computer Abstractions and Technology — 21
Performance Summary • Performance depends on • Algorithm: affects IC, possibly CPI • Programming language: affects IC, CPI • Compiler: affects IC, CPI • Instruction set architecture: affects IC, CPI, Tc The BIG Picture Chapter 1 — Computer Abstractions and Technology — 22
Games People Play with Benchmarks • Synthetic benchmarks (Whetstone, Linpack, and Dhrystone) are convenient, but easy to fool. These problems arise directly from the commercial pressures to quote good benchmark numbers. • Manufacturers can equip their compilers with special switches to emit code that is tailored to optimize a given benchmark at the cost of slower performance on a more general job mix. “Just get us some good numbers!” • The benchmarks are usually small enough to be run out of cache memory.This says nothing of the efficiency of the entire memory system, which mustinclude cache memory, main memory, and support for virtual memory. • A 1995 Intel special compiler was designed only to excel in the SPEC integer benchmark. Its code was fast, but incorrect. • Small benchmarks often give overly optimistic results.
SPEC Benchmarks • The SPEC (Standard Performance Evaluation Corporation) was founded in 1988 by a consortium of computer manufacturers. • As of 2007, the current SPEC benchmarks were: • 1. CPU2006measures CPU throughput, cache and memory access speed,and compiler efficiency. This has two components: • SPECint2006 to test integer processing, andSPECfp2006 to test floating point processing. • 2. SPEC MPI 2007 measures the performance of parallel computing systems and clustersrunning MPI (Message–Passing Interface). • 3. SPECweb2005 is a set of benchmarks for web servers. • 4. SPEC JBB2005 is a set of benchmarks for server–side Java performance. • 5. SPEC JVM98 is a set of benchmarks for client–side Java performance. • 6. SPEC MAIL 2001 is a set of benchmarks for mail servers.
Concluding Remarks on Benchmarks • First, remember the great temptation to manipulate benchmark results for commercial advantage. As the Romans said, “Caveat emptor”. • Do not forget Amdahl’s Law, which computes the improvement in overall system performance due to the improvement in a specific component. • This law was formulated by George Amdahl in 1967. One formulation of the law is given in the following equation.S = 1 / [ (1 – f) + (f /K) ] • S is the speedup of the overall system, • f is the fraction of work performed by the faster component, and K is the speedup of the new component.
Amdahl’s Law for Multiprocessors • Consider a computer system with K CPU’s. • Let f represent the fraction of the job mix that can be executed in parallel. Then (1 – f) represent the amount of sequential code. • Amdahl’s law remains the same.S = 1 / [ (1 – f) + (f /K) ] • As the number of processors becomes large,this approaches a limiting value S = 1/(1 – f).