280 likes | 443 Views
Prestasi dan Kos. Dr perspektif pembelian Ada pelbagai koleksi mesin, ada mesin yang best performance ? least cost ? best performance / cost ? Dr perspektif rekabentuk Berdepan dgn pilihan rekabentuk, ada rekabentuk yang best performance improvement ? least cost ?
E N D
Prestasi dan Kos • Dr perspektif pembelian • Ada pelbagai koleksi mesin, ada mesin yang • best performance ? • least cost ? • best performance / cost ? • Dr perspektif rekabentuk • Berdepan dgn pilihan rekabentuk, ada rekabentuk yang • best performance improvement ? • least cost ? • best performance / cost ? • Kedua-dua perlu • basis for comparison • metric for evaluation • Our goal is to understand cost & performance implications of architectural choices
DC to Paris Speed Passengers Throughput (pmph) 6.5 hours 610 mph 470 286,700 3 hours 1350 mph 132 178,200 Two notions of “performance” Plane Boeing 747 BAD/Sud Concodre Yang mana berprestasi lbh tinggi? ° Masa utk melakukan tugas (Execution Time) – masa perlaksanaan, masa respon,latency (pendaman) ° Bil. tugasan sehari, sejam, seminggu, sesaat, ns. .. (Prestasi) – prestasi, truput, bandwidth (lebar jalur) Response time and throughput often are in opposition - why?
Definasi Prestasi • Prestasi ialah dlm bil. unit sesuatu benda-per-saat • Lbh besar lagi bagus • Jk kita pertimbangkan ‘response time’ • performance(x) = 1 execution_time(x) • " X ialah n kali lbh laju dr Y" iaitu performance(X) execution_time(Y) n = ---------------------- = ---------------------- performance(Y) execution_time(X) • Bilakah truput lbh penting drpd ‘execution time’? • Bilakah ‘execution time’ lbh penting drpd truput?
Cth prestasi • Masa Concorde lwn. Boeing 747? • Concord iallah 1350 mph / 610 mph = 2.2 kali lbh laju • = 6.5 hours / 3 hours • Truput bg Concorde lwn. Boeing 747 ? • Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster” • Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster” • Boeing adalah 1.6 kali (“60%”) lbh laju sekiranya dlm truput • Concord adalah 2.2 kali (“120%”) lbh laju dr segi masa penerbangan • Apabila membincangkan prestasi pemproses, kita fokuskan kepada ‘execution time’ utk satu tugas (job) - kenapa?
Memahami Prestasi • Sejauh mana perkara berikut memberi kesan kpd ‘response time’ dan truput? • Meningkat kelajuan clock pemproses. • Meningkatkan bilangan job dlm sistem (cth, satu komputer melayan multi pengguna). • Meningkat bilangan pemproses dlm sistem yg menggunakan multi pemproses (cth, a network of ATM machines). • Jk Pentium III melarikan satu program dlm masa 8 saat dan PowerPC melarikan program yg saman dlm masa 10 saat, berapa kali kelajuan Pentium Pro? n = 10 / 8 = 1.25 kali lbh laju (or 25% faster)
Definasi Masa • Ada beberapa definasi masa, bergantung kpd apa yg kita ukur: • Response time : Jumlah masa utk menyelesaikan tugas, termasuklah masa yg digunakan utk perlaksanaan pd CPU, capaian cakera dan memori, tunggu I/O dan pemprosesan lain, dan OS overhead. • CPU execution time : Jumlah masa yg digunakan oleh CPU utk menyelesaikan tugas yg diberi (tdk termasuk masa I/O atau masa larian program lain). Ia juga dikenali sbg CPU time. • User CPU time : Jumlah masa yg diperlukan oleh CPU dlm program • System CPU execution time : Jumlah masa yg diperlukan oleh OS utk melaksanakan tugas bg program tersebut. • For example, a program may have a system CPU time of 22 sec., a user CPU time of 90 sec., a CPU execution time of 112 sec., and a response time of 162 sec..
Jam Komputer (Computer Clocks) • ‘Computer clock’ runs at a constant rate and determines when events take placed in hardware. Clk clock period • The clock cycle time is the amount of time for one clock period to elapse (e.g. 5 ns). • The clock rate is the inverse of the clock cycle time. • For example, if a computer has a clock cycle time of 5 ns, the clock rate is: • 1 • ---------------------- = 200 MHz • 5 x 10 sec -9
Computing CPU time • The time to execute a given program can be computed as CPU time = CPU clock cycles x clock cycle time Since clock cycle time and clock rate are reciprocals CPU time = CPU clock cycles / clock rate • The number of CPU clock cycles can be determined by CPU clock cycles = (instructions/program) x (clock cycles/instruction) = Instruction count x CPI which gives CPU time = Instruction count x CPI x clock cycle time CPU time = Instruction count x CPI / clock rate • The units for this are instructions cIock cycles seconds seconds = ----------------- x ----------------- x ---------------- program instruction clock cycle
Example of Computing CPU time • If a computer has a clock rate of 50 HHz, how long does it take to execute a program with 1,000 instructions, if the CPI for the program is 3.5? • Using the equation CPU time = Instruction count x CPI / clock rate gives CPU time = 1000 x 3.5 / (50 x 10 ) • If a computer’s clock rate increases from 200 MHz to 250 MHz and the other factors remain the same, how many times faster will the computer be? CPU time old clock rate new 250 MHz ------------------- = ---------------------- = ---------------- = 1.25 CPU time new clock rate old 200 MHZ • What simplifying assumptions did we make? 6
Factors affecting CPU Performance instr. count CPI clock rate Program Compiler Instr. Set Arch. Organization Technology • Which factors are affected by each of the following? CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle
Computing CPI • The CPI is the average number of cycles per instruction. • If for each instruction type, we know its frequency and number of cycles need to execute it, we can compute the overall CPI as follows: CPI = ΣCPI x F • For example n i i i = 1 Op F CPI CPI x F % Time ALU 50% 1 .5 23% Load 20% 5 1.0 45% Store 10% 3 .3 14% Branch 20% 2 .4 18% Total 100% 2.2 100% i i i i
Performance Summary • The two main measure of performance are • execution time : time to do the task • throughput : number of tasks completed per unit time • Performance and execution time are reciprocals. Increasing performance, decreases execution time. • The time to execute a given program can be computed as: CPU time = Instruction count x CPI x clock cycle time CPU time = Instruction count x CPI / clock rate • These factors are affected by compiler technology, the instruction set architecture, the machine organization, and the underlying technology. • When trying to improve performance, look at what occurs frequently => make the common case fast.
Computer Benchmarks • A benchmark is a program or set of programs used to evaluate computer performance. • Benchmarks allow us to make performance comparisons based on execution times • Benchmarks should • Be representative of the type of applications run on the computer • Not be overly dependent on one or two features of a computer • Benchmarks can vary greatly in terms of their complexity and their usefulness.
Jenis Benchmark Cons Pros • very specific • non-portable • difficult to run, or • measure • hard to identify cause • representative Actual Target Workload • portable • widely used • improvements useful in reality • less representative Full Application Benchmarks (e.g., SPEC benchmarks) • does not measure memory system Small “Kernel” Benchmarks • easy to run, early in design cycle • “peak” may be a long way from application performance • identify peak capability and potential bottlenecks Microbenchmarks
SPEC: System Performance Evaluation Cooperative • Bencmark2 SPEC yg kerap digunakan dgn meluas utk merekodkan prestasi workstation dan PC. • First Round SPEC CPU89 • 10 programs yielding a single number • Second Round SPEC CPU92 • SPEC CINT92 (6 integer programs) and SPEC CFP92 (14 floating point programs) • Compiler flags can be set differently for different programs • Third Round SPEC CPU95 • New set of programs: SPEC CINT95 (8 integer programs) and SPEC CFP95 (10 floating point) • Single compiler flag setting for all programs • Fourth Round SPEC CPU2000 • New set of programs: SPEC CINT2000 (12 integer programs) and SPEC CFP2000 (14 floating point) • Single compiler flag setting for all programs • Value reported is the SPEC ratio • CPU time of reference machine / CPU time of measured machine
Benchmark2 SPEC yang lain • JVM98: • Mengukur prestasi Java Virtual Machines • SFS97: • Mengukur prestasi protokol2 network file server (NFS) • Web99: • Mengukur prestasi aplikasi2 World Wide Web • HPC96: • Mengukur prestasi aplikasi besar, industri • APC, MEDIA, OPC • Mengukur prestasi aplikasi2 grafik • For more information about the SPEC benchmarks see: http://www.spec.org.
Cth bagi Benchmark2 SPEC95 • Dibawah merupakan nisbah SPEC ratios utk pemproses Pentium dan Pentium Pro (Pentium+) • Apa yg kita dapat drpd maklumat ini?
Peringkasan Prestasi • Kaedah yg digunakan utk meringkas prestasi bg beberapa benchmark bergantung kpd jenis pengukuran. • Utk satu set ‘execution time’, T1 ke Tn, gunakan arithmetic mean (AM) atau weighted arithmetic mean (WAM). • AM = (T1 + T2 + … + TN) / N • WAM = (W1*T1 + W2*T2 + … + WN*TN) • Utk satu set ‘normalized execution time ratio’, R1 ke RN, gunakan geometric mean (GM). • GM = (R1 * R2 *… * RN)^(1/N) /* the Nth root of the product */ • The geometric mean of exeuction time ratios is not proportional to the total execution time.
| Normalized to A | Normalized to B | Cth bg Peringkasan Prestasi • Dua program P1 dan P2 dijalankan pd komputer A dan B. Jadual berikut menunjukkan pelbagai keadah utk peringkasan prestasi. • Apakah kebaikan dan keburukan bila menggunakan ‘geometric mean’ utk menjejak ‘normalized execution times’? • Utk benchmark SPEC, adakah anda menggunakan arithmetic atau geometric mean? Kenapa?
Pengukuran Prestasi yg Tidak Bagus • Ukuran yg digunakan dalam pemasaran utk mengukur prestasi komputer ialah MIPS dan MFLOPS • MIPS : millions of instructions per second • MIPS = instruction count / (execution time x 10^6) • Sbg cth, program yg melaksanakan 3 juta arahan dlm masa 2 saat ialah 1.5 MIPS • Kelebihan : Mudah difahami dan diukur • Keburukan : tidak menunjukkan prestasi sebenar, kerana arahan yg mudah lagi cepat. • MFLOPS : millions of floating point operations per second • MFLOPS = floating point operations / (execution time x 10^6) • Sbg cth, program yg melakasanakan 4 juta arahan dlm masa 5 saat ialah 0.8 MFLOPS • Kelebihan : Mudah difahami dan diukur • Keburukan : sama dgn MIPS, hanya ukur titik apungan
MIPS • Example 2: Impact of optimizing compiler Assume the following program makeup: Operation Freq Clock Cycles ALU 43% 1 Load 21% 2 Store 12% 2 Branch 24% 2 Assume a 20 ns clock, optimizing compiler eliminates 50% of all ALU operations
MIPS (cont.) • Answer Not Optimized: Ave CPI = 0.43x1 + 0.21x2 + 0.12x2 + 0.24x2 = 1.57 MIPS = 50 MHz/1.57x10^6 = 31.8 Optimized: Ave CPI = (0.43/2x1 + 0.21x2 + 0.12x2 + 0.24x2) (1 - 0.43/2) = 1.73 MIPS = 50 MHz/1.73x10^6 = 28.6
Hukum Amdahl Kepantasan (Speedup) yg disebabkan oleh kemajuan ditakrif sbg: ExTime old Performance new Speedup = ------------- = ------------------- ExTime new Performance old Katakanlah ‘enhancement accelerate’ ialah pecahan Fractionenhanced of the task by a factor Speedupenhanced, ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedup= = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced
Cth Hukum Amdahl’ • Arahan titik apungan ditingkatkan dua kali lbh laju, ttp pada hakikatnya hy 10% masa digunakan utk arahan ini. Brp pantas ianya pada mesin baru? 1 ExTimeold ExTimenew Speedup= = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 Speedup= = 1.053 (1 - 0.1) + 0.1/2 • Mesin baru ialah 1.053 kali lbh laju, atau 5.3% lbh laju. • Jk arahan titik apungan 100 kali lbh pantas, berapakah kepantasan mesin baru? 1 Speedup= = 1.109 (1 - 0.1) + 0.1/100
Menganggarkan Kemajuan Prestasi • Andaikan pada masa ini pemproses memerlukan 10 saat utk melaksanakan satu program dan pretasi pemproses meningkat 50% setahun. • Berapakah peningkatan prestasi dlm masa 5 tahun? (1 + 0.5)^5 = 7.59 • Berapa lama masa diambil oleh pemproses utk melaksanakan program selepas 5 tahun? ExTimenew = 10/7.59 = 1.32 seconds • Apakah andaian yg dibuat utk masalah di atas?
Cth Prestasi • Computer M1 dan M2 menggunakan set arahan yg sama. • Clock rate M1 ialah 50 MHz dan M2 ialah 75 MHz. • CPI bg M1 ialah 2.8 dan bagi M2 ialah 3.2 bg program yg diberi. • Berapa kali pantas M2 drpd M1 utk program ini? • Apakah clock rate bg M1 supaya ‘execution time’ kedua-duanya sama? ExTimeM1 ICM1 x CPIM1 / Clock RateM1 2.8/50 = = = 1.31 ExTimeM2 ICM2 x CPIM2 / Clock RateM2 3.2/75
Ringkasan bg Penilaian Prestasi • Good benchmarks, spt benchmark SPEC, blh memberikan kaedah penilaian dan perbandingan prestasi komputer dgn tepat. • Utk ‘execution time’ gunakan ‘arithmetic mean’, ttp utk ‘normalized execution time ratio’ gunakan ‘geomentric mean’. • MIPS dan MFLOPS mudah digunakan, ttp ia menunjukkan nilai prestasi yg tidak tepat. • Hukum Amdahl sesuai utk menentukan speedup yg disebabkan kemajuan.