Computer Architecture

Computer Architecture Part I-C: Performance

What does faster mean? • Response time • The time spent to complete an event • Also referred to as execution time or latency • Throughput • Amount of work done in a given time • Also referred to as bandwidth • In general, faster response time means an improvement in throughput

Execution Time and Performance • Quantitatively, execution time is inversely proportional to performance. • improve performance = increase performance • improve execution time = decrease execution time • X is n times faster than Y means

Make the Common Case Fast • A rule of thumb in computer design is to make the event that occurs more frequently, faster • In making a design trade-off, favor the frequent case over the infrequent case • In general, this move should increase overall performance

Amdahl’s Law • The performance improvement to be gained from using some faster mode of operations is limited by the fraction of time that faster mode can be used. • Speedup due to enhancement E ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E (for an entire task)

Factors Affecting the Speedup • The fraction of computation time in the original machine that can be converted to take advantage of the enhancement • The improvement gained by the enhanced execution mode, i.e. how much faster the task would run if the enhanced mode were used for the entire program.

Applying Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced

Using Amdahl’s Law: An Example Suppose that we are considering an enhancement that runs 10 times faster than the original machine but is only usable 40% of the time. What is the overall speedup gained by incorporating the enhancement? Answer: Fractionenhanced = 0.4 Speedupenhanced = 10 Speedupoverall = 1 / [0.6+(0.4/10)] = 1/0.64 = 1.56

Measuring CPU Processing Speed: The Clock • A circuit which generates a signal that defines regular time intervals or cycles during which basic CPU steps are performed • Provides control as to when each step of the instruction cycle takes place

One clock pulse is the burst of current when the clock output is equal to 1 A clock cycle is the interval between the beginning of a pulse to the beginning of the next Measured in Hertz, a unit of measurement of electrical vibrations. I Hz = 1 cycle/second Basic unit of CPU speed = 1 million Hz or 1 MHz pulse cycle Clock Cycles

Locality of Reference • Programs tend to reuse data and instructions they have used recently. • A program may spend 90% of its execution time in only 10% of the code. • Based on a program’s recent past, one can predict with reasonable accuracy what instructions and data will use in the near future.

Two Types of Locality • Temporal Locality • recently accessed items are likely to be accessed in the near future • Spatial Locality • items whose addresses (or location) are near one another tend to be referenced close together in time

Metrics of Performance Application Answers per month Operations per second Programming Language Compiler (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s ISA Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins

MIPS Benchmark • Millions of Instructions Per Second • Easy to understand and straightforward • Dependent on instruction set • Varies between programs on the same computer • MIPS can vary inversely with performance!

MFLOPS Benchmark • Millions of Floating-point Operations Per Second (MegaFLOPS) • Intended to measure floating-point operations but some programs don’t use any • Floating-point operations are not consistent across machines • MFLOPS ratings for the same machine may differ depending on instruction mix

Programs as Evaluators • Four types (in decreasing order of accuracy): • Real programs • Kernels • Toy Benchmarks • Synthetic Benchmarks

Synthetic Benchmarks • Programs which try to match the average number and frequency of operations of a typical workload, e.g. dhrystone, whetstone, etc. • Not real programs, may not reflect program behavior for factors not measured • Compilers and hardware optimizations can artificially inflate results

Toy Benchmarks • Small, simple programs • Produce a result the user already knows • Example: quicksort, Sieve of Erastosthenes, etc.

Kernel Benchmarks • Small, key pieces from real programs put together to evaluate machine performance • Examples: Linpack, Livermore Loops, etc. • No user would run kernel programs because they exist solely for performance evaluation • Best used to isolate performance of individual features of machines to explain the reasons for differences in real programs

Real Programs • Common programs like compilers (e.g. C), word processors (e.g. TeX, MS Word), computer-aided design tools (e.g. Spice), etc. • Real programs have the input, output, and options that a user can select.

When Benchmarks Disagree What is MMX’s real speed? source: adapted from Byte April 1998

Popular Benchmarks • Bapco SYSmark - application, tests system • BYTEmark - synthetic, tests processor • Intel Media - synthetic, tests processor (multimedia, uses MMX instructions) • CaffeineMark - synthetic, tests JVM • SPEC CPU95 - synthetic, tests processor (two suites: integer and floating-point) • SPEC Glperf - synthetic, tests 3-D graphics • SPEC Viewperf - application, 3-D graphics • Norton Multimedia - synthetic, tests system (multimedia, uses MMX instructions)

Popular Benchmarks • TPC-C (Transaction Processing Council) - database application, tests transaction-processing performance • TPC-D - database application, tests decision support and data-warehousing performance • ZDBOp (Ziff-Davis Benchmark Operation): • BrowserComp - application, tests browsers • CPUmark32 - synthetic, tests processor • NetBench - application, tests network performance • ServerBench - application, tests server performance • WebBench - application, tests web server • WinBench - application, tests component subsystems • Winstone - application, tests system

Programs as Evaluators • Companies may design features that would make their machines run faster on the benchmarks than on real programs • A standard set of programs is hard to obtain because each program run differently for each machine and companies would want to use programs that run fast on their machines

Computer Architecture