220 likes | 366 Views
Computer Architecture Slide Sets WS 2011/2012 Prof. Dr. Uwe Brinkschulte Prof. Dr. Klaus Waldschmidt. Part 6 Fundamentals in Performance Evaluation. Why performance evaluation?. Comparison of computers Selection of a computer
E N D
Computer Architecture Slide Sets WS 2011/2012 Prof. Dr. Uwe Brinkschulte Prof. Dr. Klaus Waldschmidt Part 6 Fundamentals in Performance Evaluation Computer Architecture – Part 6 –page 1 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Why performance evaluation? • Comparison of computers • Selection of a computer • Changes in the configuration of an existing computer (tuning) • Design of computers • Verification or validation of design desicions • Methods for performance evaluation:(1)analytical methods(2) measurements Computer Architecture – Part 6 –page 2 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Aspects for evaluation Computer Architecture – Part 6 –page 3 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Analytical methods • Performance measures: (hypothetical maximaum performance !!) • MIPS (Millions of Instructions per Second) • MFLOPS (Millions of Floating Point Operations per Sec.) • Mix: (as well calculated, not measured) • In a mix, the average execution time for each instruction is calculated and scaled by a characteristical weight. • Core-Programs: • Typical application programs, written for the evaluated computer • No measurements, the overall execution time is calculated using the execution times of the single machine instructions Computer Architecture – Part 6 –page 4 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
runtime = # clock cycles * clock period MIPS (million instruction per second) instruction count MIPS = runtime • 106 instruction count instruction count • clock frequency MIPS = = # clock cycles • clock period • 106 # clock cycles • 106 clock frequency clock frequency • IPC MIPS = = CPI • 106106 MFLOPS(million floating point operations per second) # executed floating point instruction MFLOPS = runtime • 106 Performance measures • CPI (cycles per instruction) • # clock cycles CPI = instruction count • IPC (instructions per cycle) • ICP = 1 / CPI Computer Architecture – Part 6 –page 5 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
CPI, IPC, MIPS and MFLOPS are dependent on the instruction set. CPI, IPC, MIPS and MFLOPS are dependent on the program. CPI, IPC, MIPS and MFLOPS are dependent on the microarchitecture Drawbacks of performance measures • Conclusions: • Greater MIPS or MFLOPS ratings do not implicitly mean more performance! • It is of vital importance to chose well-suited test applications (benchmarks)! Computer Architecture – Part 6 –page 6 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Measurements • Benchmarks • Use of existing or synthetic programs to measure the performance • These programs are translated and executed on the evaluated computer • Therefore, not only the computer hardware, but as well the compiler influences the outcome of a benchmark • Monitoring: • Monitors are used to observe parts of the computer at run-time • Therefore, interesting quantities inside the computer can be measured beside the overall outcome of a benchmark (e.g. cache utilization, network traffic, …) • Monitoring can be done by hardware or software Computer Architecture – Part 6 –page 7 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
benchmark A test program. benchmark suite A set of benchmarks. synthetic benchmark A test program only useful as benchmark. kernel benchmark A very small synthetic benchmark. Usually a time intensive part of a real program is chosen. Kernel benchmarks are well suited for design and simulation but normally unqualified to compare complete systems. benchmark application A complete program additionally used as benchmark. Opposite to synthetic benchmark. Benchmark terminology Computer Architecture – Part 6 –page 8 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
SPEC-Benchmarks • SPEC Standard Performance Evaluation Corporationsince 1989, consortium of different manufacturer, general purpose computer applications, mainly to measure speed and throughput • Several benchmark suites, e.g. • SPEC95, • SPECweb96, • SPEC JVM98 • SPEC JBB2000 • SPEC CINT 2006 • SPEC CFP 2006 Computer Architecture – Part 6 –page 9 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
SPECmarks • Goal: comparable values for different systems • But: single values don't always reflect real relations, therefore only a first indication to select or judge a computer • CPU performance plus cache, memory and compiler is measured, the operating system and IO is less relevant • Integer test-programs (ANSI C) • Floating-point test-programs (Fortran77) • „SPECmark“: this characteristic is the geometric mean of the individual program characteristics contained in the suite Computer Architecture – Part 6 –page 10 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
SPEC-CINT2006: 12 Integer test programs (C, C++) Computer Architecture – Part 6 –page 11 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
SPEC-CFP2006: 17 Floating-point test programs (C, C++, FORTRAN) Computer Architecture – Part 6 –page 12 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
More popular benchmark suites • Basic Linear Algebra Subprograms (BLAS): • For numerical applications • Core of the LINPACK software package to solve lienar equation systems • TOP 500 list of the fastest parallel computers • Whetstone-Benchmark: • Developed in the seventies, a single program with lot of floating-point calculations • Dhrystone-Benchmark: • Improvement of Whetstone, developed in the eighties • Powerstone-Benchmark-Suite: • To compare the energy consumption of microprocessors and microcontrollers Computer Architecture – Part 6 –page 13 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Powerstone benchmark suite Computer Architecture – Part 6 –page 14 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Monitors are components recording the states of a system during its normal operation. Contents of registers, flags, buffers and traffic in data paths are recorded. Monitors are used to observe and debug systems. Monitoring Computer Architecture – Part 6 –page 15 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Generally, monitors can be classified in: a) Hardware monitors A hardware monitor is a separate component which is physically connected to the locations of the target system where measurements take place. Hardware monitors typically consist of comparators and counters to create data, memories to store it and busses for data transport. Thus, hardware monitors use its own resources. Monitoring Computer Architecture – Part 6 –page 16 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
b) Software monitors A software monitor is a program, implemented to collect measuring data through interfaces provided by the operation system, the programming languages or application program. A software monitor uses the resources of the observed system to collect, transport and store data. c) Hybrid monitors A hybrid monitor is a mixed hardware and software monitor. Often simple elements like counters and memories are implemented in hardware while more complex observation functions are implemented in software. Monitoring Computer Architecture – Part 6 –page 17 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Accessing information Ideally monitoring is integrated into the hardware and software components of a system during design. Software monitors are cheaper than hardware monitors but they may influence the systems run time behavior. 2. Reaction less monitoring Hardware and most hybrid monitors store the recorded data in their own memories. Software monitors have to use the memories of the observed system. Thus, hardware monitors are more reaction less than software monitors. Monitoring constraints Computer Architecture – Part 6 –page 18 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Amount of recorded data and its further processing Most purposes, especially debugging, require observations with high resolution. For the accurate analysis of program errors the causing machine instruction has to be identified. For other purposes, e.g. a global performance analysis, a coarser resolution is sufficient. Although it often seems necessary to record observable data on the level of machine instruction execution, this would generate traces much greater than the memory usage of the observed application. Thus, the cost to store this high amount of data and the general difficulties of processing the trace data prohibit a complete recording of traces at machine instruction level. Monitoring constraints: Computer Architecture – Part 6 –page 19 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
One way of software monitoring is to insert measuring commands into program code e.g. loop or time counters. This is called instrumentation. Instrumentation can be performed by the user, the compiler, the class library or the operation system. Instrumentation computer results measure results instrumented program measure system Computer Architecture – Part 6 –page 20 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
Montitoring overview system state accuracy tools method direct instrumentation trace driven simulation simulation hardware very high Hardware monitor hardware high instrumented program hard- and satisfactory simulation program software + hardware „Trace“ software sufficient simulation program Computer Architecture – Part 6 –page 21 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt
throughput Defines the average number of jobs completed per time unit. A job may be: execution of an instruction or a program, saving a data block or sending a message. utilization Defines the throughput (average number of jobs completed) divided by the maximum possible throughput. Typical load-dependent parameters • utilization ratio • Defines the time spent working on the jobs divided by whole operating time. • response time • Defines the average time needed to complete a job. Computer Architecture – Part 6 –page 22 of 22 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt