360 likes | 658 Views
Review of Technology Trends and Cost/Performance. Ali Azarpeyvand Advanced Computer Architecture. Outline. Cost / Price IC cost Performance? Amdahl ’ s law CPI Benchmarks. Cost. Die cost = Wafer cost Dies per Wafer * Die yield.
E N D
Review of Technology Trends and Cost/Performance Ali Azarpeyvand Advanced Computer Architecture
Outline • Cost / Price • IC cost • Performance? • Amdahl’s law • CPI • Benchmarks Advanced Computer Architecture
Die cost = Wafer cost Dies per Wafer * Die yield IC cost = Die cost + Testing cost + Packaging cost Final test yield Integrated Circuits Costs Die Cost goes roughly with die area5 Advanced Computer Architecture
Wafer Advanced Computer Architecture
Real World Examples Chip Metal Line Wafer Defect Area Dies/ Yield Die Cost layers width cost /cm2 mm2 wafer 386DX 2 0.90 $900 1.0 43 360 71% $4 486DX2 3 0.80 $1200 1.0 81 181 54% $12 PowerPC 601 4 0.80 $1700 1.3 121 115 28% $53 HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73 DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149 SuperSPARC 3 0.70 $1700 1.6 256 48 13% $272 Pentium 3 0.80 $1500 1.5 296 40 9% $417 • From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15 Advanced Computer Architecture
DRAM Prices (close to Costs) Advanced Computer Architecture
Design for What? • For Performance • Supercomputer • For cost • Cellular phones • For cost / performance • Workstations • Now back to performance Advanced Computer Architecture
The Bottom Line: Performance (and Cost) • "X is n times faster than Y" means • ExTime(Y) Performance(X) • --------- = --------------- • ExTime(X) Performance(Y) • Speed of Concorde vs. Boeing 747 • Throughput of Boeing 747 vs. Concorde Advanced Computer Architecture
“Average Cycles per Instruction” • CPI = (CPU Time * Clock Rate) / Instruction Count • = Cycles / Instruction Count n CPU time = CycleTime * SCPI * I i i i = 1 “Instruction Frequency” n CPI = SCPI * F i i i = 1 Cycles Per Instruction Advanced Computer Architecture
Base Machine (Reg / Reg) Op Freq Cycles CPI(i) (% Time) ALU 50% 1 .5 (33%) Load 20% 2 .4 (27%) Store 10% 2 .2 (13%) Branch 20% 2 .4 (27%) 1.5 Typical Mix Example: Calculating CPI Advanced Computer Architecture
Measurement Tools • Benchmarks, Traces, Mixes • Hardware: Cost, delay, area, power estimation • Simulation (many levels) • ISA, RT, Gate, Circuit • Rules of Thumb • Fundamental “Laws”/Principles Advanced Computer Architecture
Applications for Measuring Performance • Real applications • gcc, MS Word, photoshop • Modified (or scripted) applications • Enhance portability • Emphasize the required criteria (like using scripts instead of IO when CPU power is considered) • Kernels • small, key pieces from real programs • Toy benchmarks • No particular use, just a code like quicksort, … • Synthetic benchmarks Advanced Computer Architecture
Performance: What to measure • Usually rely on benchmarks vs. real workloads • To increase predictability, collections of benchmark applications-- benchmark suites -- are popular • SPECCPU: popular desktop benchmark suite • CPU only, split between integer and floating point programs • SPECint2000 has 12 integer, SPECfp2000 has 14 integer pgms • SPECCPU2006 (12 Integer, 17 FP) • SPECSFS (NFS file server) and SPECWeb (WebServer) added as server benchmarks • Embedded (EEMBC) • 34 Kernels • Transaction Processing Council measures server performance and cost-performance for databases • TPC-C Complex query for Online Transaction Processing • TPC-H models ad hoc decision support • TPC-W a transactional web benchmark • TPC-App application server and web services benchmark Advanced Computer Architecture
SPEC: System Performance Evaluation Cooperative • First Round 1989 • 10 programs yielding a single number (“SPECmarks”) • Second Round 1992 • SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) • Third Round 1995 • new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) • “benchmarks useful for 3 years” • SPEC CPU 2000 • SPEC CPU 2006 Advanced Computer Architecture
SPEC CPU2000 Advanced Computer Architecture
CINT 2006 400.perlbench C PERL Programming Lang 401.bzip2 C Compression 403.Gcc C C Compiler 429.Mcf C Combinatorial Optimization 445.Gobmk C Artificial Intelligence: go 456.Hmmer C Search Gene Sequence 458.Sjeng C Artificial Intelligence: chess 462.Libquantum C Physics: Quantum Computing 464.h264ref C Video Compression 471.Omnetpp C++ Discrete Event Simulation 473.Astar C++ Path-finding Algorithms 483.Xalancbmk C++ XML Processing Advanced Computer Architecture
CFP 2006 • 410.Bwaves Fortran Fluid Dynamics • 416.Gamess Fortran Quantum Chemistry • 433.Milc C Physics: Quantum Chromodynamics • 434.Zeusmp Fortran Physics/CFD • 435.Gromacs C/Fortran Biochemistry/Molecular Dynamics • 436.cactusADM C/Fortran Physics/General Relativity • 437.leslie3d Fortran Fluid Dynamics • 444.Namd C++ Biology/Molecular Dynamics • 447.dealII C++ Finite Element Analysis • 450.Soplex C++ Linear Programming, Optimization • 453.Povray C++ Image Ray-tracing • 454.Calculix C/Fortran Structural Mechanics • 459.GemsFDTD Fortran Computational Electromagnetics • 465.Tonto Fortran Quantum Chemistry • 470.Lbm C Fluid Dynamics • 481.Wrf C/Fortran Weather Prediction • 482.sphinx3 C Speech recognition Advanced Computer Architecture
Means Advanced Computer Architecture
Weighted Means Advanced Computer Architecture
Relations among Means Equality holds if and only if all the elements are identical. Advanced Computer Architecture
System Rate (Task 1) Rate (Task 2) A 10 20 B 20 10 Summarizing Performance Which system is faster? Advanced Computer Architecture
Average Average Average System System System Rate (Task 1) Rate (Task 1) Rate (Task 1) Rate (Task 2) Rate (Task 2) Rate (Task 2) 1.00 1.25 15 A A A 0.50 10 1.00 2.00 1.00 20 1.00 1.25 15 B B B 2.00 20 1.00 1.00 0.50 10 … depends who’s selling Average throughput Throughput relative to B Throughput relative to A Advanced Computer Architecture
Power and Energy • Energy to complete operation (Joules) • Corresponds approximately to battery life • (Battery energy capacity actually depends on rate of discharge) • Peak power dissipation (Watts = Joules/second) • Affects packaging (power and ground pins, thermal design) • di/dt, peak change in supply current (Amps/second) • Affects power supply noise (power and ground pins, decoupling capacitors) Advanced Computer Architecture
Peak Power versus Lower Energy • System A has higher peak power, but lower total energy • System B has lower peak power, but higher total energy Peak A Peak B Power Integrate power curve to get energy Time Advanced Computer Architecture
Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected Advanced Computer Architecture
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced Amdahl’s Law Advanced Computer Architecture
Amdahl’s Law • Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew= Speedupoverall = Advanced Computer Architecture
1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced Amdahl’s Law • Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew= ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold 1 Speedupoverall = = 1.053 0.95 Advanced Computer Architecture
Reg's Cache Disk / Tape Memory “Make The Common Case Fast” • All instructions require an instruction fetch, only a fraction require a data fetch/store • Optimize instruction access over data access • Programs exhibit locality • Spatial Locality • items with addresses near one another tend to be referenced close together in time • Temporal Locality • recently accessed items are likely to be accessed in the near future • Access to small memories is faster • Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Advanced Computer Architecture
Metrics of Performance Application Answers per month Operations per second Programming Language Compiler (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s ISA Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins Advanced Computer Architecture
Basics of Performance Advanced Computer Architecture
Details of CPI Advanced Computer Architecture
CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Aspects of CPU Performance Inst Count CPI Clock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X Advanced Computer Architecture
Summary • Cost / Price • Integrated Circuits Costs • Measurments • SPEC: System Performance Evaluation Cooperative • Amdahl's Law: Make common case fast • Aspects of CPU Performance Advanced Computer Architecture