320 likes | 331 Views
This guide explains the importance of benchmarking and provides an overview of different types of benchmarks. It includes a list of industry-standard and non-industry-standard benchmarks, as well as synthetic, hybrid, and real-world benchmarks. Discover how benchmarking can help evaluate performance across various devices and systems.
E N D
Benchmarks Asztalos Olivér (HWSW.hu) 2017 őszi félév (Ver. 1.0) Asztalos Olivér 2017
What is this all about? • MeasureMeasuring with a test suit including a bunch of properly selected benchmarks can give a results for proper comparison. • Compare Based on the results, you can see the relative performance and some other specifics of different (hardware) components or complete systems compared to each other. This can give decision makers, media, channel buyers, consultants, and system and component designers and manufacturers an objective, easy to use tool to evaluate performance across the wide range of activities that a user may encounter.
Type of benchmarks(based on use case) • Internal benchmarks (confidential) • Industry-standard benchmarks • Non-industry-standard benchmarks (custom benchmarks)
Industry-standard benchmarks • BAPCo (Business Applications Performance Corporation) SYSmark 2014: “System performance benchmark that measures and compares PC performance using real world applications, featuring different workloads. Easy-to-use tool to evaluate PC performance across the wide range of activities that a user may encounter.” Results: https://results.bapco.com/results/benchmark/sysmark_2014 • SPEC (Standard Performance Evaluation Corporation) The benchmarks aim to test "real-life" situations. The SPEC CPU suites test CPU performance by measuring the run time of several programs such as the compiler GCC, the chemistry program GAMESS, and the weather program WRF. The various tasks are equally weighted; no attempt is made to weight them based on their perceived importance. An overall score is based on a geometric mean.SPEC includes about 20 different (active) benchmarks at the moment, result database is public: https://www.spec.org/results.html
TOP500 Supercomputer (quasi-industry-standard)“The benchmark used in the LINPACK Benchmark is to solve a dense system of linear equations. For the TOP500, we used that version of the benchmark that allows the user to scale the size of the problem and to optimize the software in order to achieve the best performance for a given machine. This performance does not reflect the overall performance of a given system, as no single number ever can. It does, however, reflect the performance of a dedicated system for solving a dense system of linear equations. Since the problem is very regular, the performance achieved is quite high, and the performance numbers give a good correction of peak performance. LINPACK was chosen because it is widely used and performance numbers are available for almost all relevant systems.” https://www.top500.org/project/linpack/
Non-industry-standard benchmarks (custom benchmarks) Totally custom built test suits including wide range of different software (applications and/or benchmarks) and typically (>98%) based on the latest, standard Windows operating system. Usually used by the media, online review sites, video reviewers, printed magazines. (basically no rules)
Type of benchmarks(based on workload) • Synthetic benchmarks • Hybrid benchmarks • Real-world (or real-life) benchmarks
Synthetic benchmarks These are specially created programs that impose the workload on the component or the system. These software don’t test what a normal user would be doing on their device. They generally test the components of a computer to see their highest performance in an absolute best case scenario, usually without much context about how those components are be used. Pros: Simple to use and understand (and interpret) the results (usually gives absolute values). Can be good for microarchitectural comparison. Cons: Synthetic benchmarks usually don't provide full and correct picture to compare different vendors or different type of products (eg. smartphone vs. PC) properly. Recommended to use only as supplementary or for specific comparisons (eg. microarchitectures). Well known synthetic benchmarks: Geekbench (cross-platform), AnTuTu, AIDA64, AS SSD, CrystalDiskMark.
Some of these benchmarks are also easily manipulated and tricked by the manufacturers sometimes, "boosting" their benchmarking performance when using some of these software. [2]
AIDA64 20 years old, well known and recognized diagnostics software including four different memory (and cache), five CPU, and six FPU, and twelve GPGPU benchmarks. Developed in Hungary.
CPU Queen Benchmark: Simple integer benchmark focuses on the branch prediction capabilities and the misprediction penalties of the CPU. It finds the solutions for the classic "Queens problem" on a 10 by 10 sized chessboard. At the same clock speed theoretically the processor with the shorter pipeline and smaller misprediction penalties will attain higher benchmark scores. For example -- with HyperThreading disabled -- the Intel Northwood core processors get higher scores than the Intel Prescott core based ones due to the 20-step vs 31-step long pipeline. CPU Queen test uses integer MMX, SSE2 and SSSE3 optimizations.[3]
FPU Mandel Benchmark: This benchmark measures the double precision (also known as 64-bit) floating-point performance through the computation of several frames of the popular "Mandelbrot" fractal. The code behind this benchmark method is written in Assembly, and it is extremely optimized for every popular AMD, Intel and VIA processor core variants by utilizing the appropriate x87, SSE2, AVX, AVX2, FMA, and FMA4 instruction set extension. FPU Mandel test is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware. [3]
Hybrid benchmarks These type of benchmarks use a combination of isolated synthetic-style tests in combination with more generalized real world benchmarks.
Real-world benchmarks The stopwatch doesn't lie Mostly professional applications configured for benchmarking with a specific type of real-world workload(s). These software can represent the real-world performance of a component or the whole system. Pros: Gives the closest results to realityCons: Difficult to build and use and compare to different test suits
file archiver utilities (WinRAR, 7-Zip) : compress a bunch of different files • photo or video editors (eg. Adobe): perform some tasks (fill, rotate, color space conversion, etc.) on a photo or video • audio or video converters or compressors (x264, x265): convert or compress a file (eg.: MPEG to x264) • renderers: render a 3D animation, model, or image (eg. 3ds Max, Maya) • antivirus: scan a bunch of different files • 3D videogames: measure the frame rate (fps), or the frame time • SSD/HDD: measure the system startup time (boot time) or the startup of some applications at the same time, copy a file on the same drive, install an application, etc.
Building the test suit • clean OS install • install the applications (at lease about 8-10) • create the workloads • validate the suit (try it!) • always create a backup image
Understanding the results • Performance summary (can be categorized) • Power consumption • Performance per watt • Price/performance
Performance summary No difference is a valuable information as well! [3]
AMD vs. Intel [9]
AMD vs. Nvidia [10]
References (1) [1]: Asztalos O., Broadwell-EX: Xeonok a számok tükrében, HWSW, June 16 2016, https://www.hwsw.hu/hirek/55746/intel-broadwell-ex-xeon-e7-v4-processzor-szerver- spec-meresek-benchmark.html [2]: VIA Nano CPUID Tricks, IXBT Labs, Aug. 26 2010, http://ixbtlabs.com/articles3/cpu/via-nano-cpuid-fake-p1.html [3]: Megateszt: Intel CPU-k Nehalemtől Skylake-ig, Prohardver, Nov. 9 2015, https://prohardver.hu/teszt/intel_architekturak_nehalemtol_skylake-ig/a_nehalemhez_ vezeto_ut.html [4]: 21 processzor tesztje Windows 8 alatt, Prohardver, Szept. 11 2013, https://prohardver.hu/teszt/processzorok_windows_8_alatt/a_megujult_tesztkornyezet.html [5]: Intel Skylake, avagy a Core i7-6700K és i5-6600K, Prohardver, Aug. 13 2015, https://prohardver.hu/teszt/intel_skylake_avagy_a_core_i7-6700k_es_i5-6600k/az_ utolso_takk-tus.html [6]: AMD Radeon RX Vega 56 8 GB, Tech Power Up, Aug. 14 2017, https://www.techpowerup.com/reviews/AMD/Radeon_RX_Vega_56/33.html [7]: AMD Bulldozer – kislapát vagy munkagép?, Prohardver, Oct. 24 2011, https://prohardver.hu/teszt/amd_bulldozer_fx_8150_teszt/egy_ujabb_gorongyos_ut.html [8]: Threadripper Launchreviews: Die Testresultate zur Anwendungs-Performance im Überblick, 3D Center, Aug. 10 2017, https://www.3dcenter.org/news/threadripper-launchreviews- die-testresultate-zur-anwendungs-performance-im-ueberblick
References (2) [9]: Cutress I., The AMD Zen and Ryzen 7 Review: A Deep Dive on 1800X, 1700X and 1700, AnandTech, March 2 2017, https://www.anandtech.com/show/11170/the-amd-zen-and- ryzen-7-review-a-deep-dive-on-1800x-1700x-and-1700 [10]: Smith R., Oh N., The AMD Radeon RX Vega 64 & RX Vega 56 Review: Vega Burning Bright, AnandTech, Aug. 14 2017, https://www.anandtech.com/print/11717/the-amd-radeon-rx-vega-64-and-56-review