320 likes | 424 Views
Recap. Technology trends Cost/performance. Measuring and Reporting Performance. What does it mean to say “computer X is faster than computer Y ”?. E.g. Machine A executes a program in 10s; Machine B executes the same program in 15s. Which is true: A is 50% faster than B?
E N D
Recap • Technology trends • Cost/performance
Measuring and Reporting Performance • What does it mean to say “computer X is faster than computer Y”? • E.g. Machine A executes a program in 10s; Machine B executes • the same program in 15s. • Which is true: • A is 50% faster than B? • A is 33% faster than B?
Performance is reciprocal of time: Performance • H&P’s definition: “X is n times faster than Y” means
Example • E.g. Machine A executes a program in 10s; Machine B executes • the same program in 15s. • Which is true: • A is 50% faster than B? • A is 33% faster than B? • Answer: 1) A is 50% faster than B
Performance • Response time? • Throughput?
Measuring Performance • Focus on execution time of real programs • Measuring execution time? • Wall clock time (elapsed time) • CPU time (excludes I/O and other processes) • User CPU time • System CPU time iota:~$ time gcc -g tmpcnv.s -o tmpcnv real 0m3.352s user 0m0.367s sys 0m0.468s
Choosing Programs to Measure Performance • Real Programs • Compilers, text-processing, CAD tools, etc. • Modified applications • Scripted or modified for portability • Kernels • Attempt to extract key sections from real programs (Livermore loops, Linpack) • Toy Benchmarks • Short examples (e.g. Sieve of Eratosthenes) • Synthetic Benchmarks • Whetstone, Dhrystone
Benchmarking • H&P: car magazines are more scientific about reporting performance than many CS journals!
Benchmark Suites • Collections of benchmarks • E.g. SPEC CPU2000 (INT and FP) • 25 real FORTRAN/C/C++ programs, modified for portability • Specific graphics benchmarks
Server Benchmarks • SPEC also has server benchmarks • File server • Web server • TPC: Transaction Processing Council • Various transaction processing benchmarks
Embedded Benchmarks • Much less well developed • Tend to use Dhrystone! • EEMBC • Recent development • 34 benchmarks (mainly kernels) in five application areas
Summarising Performance Measurements • Complex area • Weighted arithmetic mean • Geometric mean • Normalised results • …
1.6 Quantitative Principles • Make the common case fast! • E.g. addition: focus on “normal” addition, not overflow situations • Amdahl’s Law • Quantifies improvements gained by focussing on one aspect of a design
Example • We are considering an enhancement that is 10 times faster than the original, but is only used 40% of the time.
CPU Performance • CPU time related to clock speed: • Period (e.g. 1ns) • Rate (e.g. 1GHz) • Also interested in Cycles Per Instruction (CPI)
Three Equal Factors • Clock rate (technology) • CPI (architecture) • Instruction count (architecture and compiler)
Measuring IC & CPI • Many modern processors include hardware counters for instructions and clock cycles • Simulations can give even more detail • Time consuming, but can be very accurate
Another Principle: Locality • Locality of Reference • “90/10 Rule” • Also applies to data • Two aspects: • Temporal locality • Spatial locality
Taking Advantage of Parallelism • Key principle for improving performance • Examples: • System level: parallel processing, disk arrays, etc. • Processor level: pipelining • Digital design: caches, ALU adders, etc.
1.7 Putting It All Together: Performance & Price/Performance • Measure performance and performance/cost for three categories • Desktop (SPEC INT and FP) • TP Servers (TPC-C) • Embedded Processors (EEMBC)
Desktop • Integer: • Performance/cost tracks performance • FP: • Not as closely related • Pentium 4 much better than Pentium III • AMD Athlon very good value for money
Servers • Twelve systems • Six top performers • Six best price-performance • Multiprocessors • 3 P3’s – 280 P3’s • Cost: • $131,000 – $15 million
Embedded Processors • Difficult to assess • Benchmarks very new • Designs very application-specific • Power a major constraint • Cost difficult to quantify (are support chips required?)
Embedded Processors • Range: • 500MHz AMD K6 ($78) and IBM PowerPC ($94) used for network switches, etc. • 167MHz NEC VR 5432 ($25) popular in colour laser printers • 180MHz NEC VR 4122 ($33) popular in PDAs (low power)
1.8 Another View: Power Consumption and Efficiency • Embedded processors from previous example: power ranged from 700mW to 9600mW • Fig. 1.27: Performance/watt • NEC VR 4122 huge leader
1.9 Fallacies and Pitfalls • Fallacy: Relative performance of two similar processors can be judged by clock rate or by a single benchmark • Factors such as pipeline structure and memory system have major impact • E.g. Pentium III vs. Pentium 4 (Fig. 1.28)
Fallacies and Pitfalls • Fallacy: Benchmarks remain valid indefinitely • Optimisations change • Perhaps deliberately! • Even real programs are affected by changes in technology • E.g. gcc: increasing percentage is “system time” • SPEC has adapted considerably
Fallacies and Pitfalls • Pitfall: Comparing hand-coded assembly and compiled high-level language performance • E.g. embedded processor benchmarks • Hand-coded is 5 – 87 times faster!