The Return of Synthetic Benchmarks

The Return of Synthetic Benchmarks January 28, 2008 Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture Department of Electrical & Computer Engineering The University of Texas at Austin

Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary

Benchmark Spectrum Complete Application Code Application Suites e.g. SPEC CPU Kernel Codes e.g. Livermore Loops Synthetic Benchmarks e.g. Dhrystone, Whetstone Microbenchmarks e.g. STREAM Toy Benchmarks e.g. Heap sort Less Development Effort More Scalable More Maintainable Less Representative More Development Effort Less Scalable Less Maintainable More Representative

Focus on Simulation Time Reduction • Statistical Sampling [Conte et al., ICCD’96 ] [Wunderlich et al., ISCA’03] • Representative Sampling [Sherwood et al., ASPLOS’02] • Reduced Input Set [ KleinOsowski, CAN’04] • Statistical Simulation & Synthetic Workloads [Oskin et al., ISCA’00] [ Eeckhout et al., ISPASS’00] [Nussbaum et al., PACT’01] [Bell et al., ICS’05] Benchmark Subsetting [Eeckhout et al., PACT’02] [Vandierendonck et al., CAECW’04] [Phansalkar et al., ISPASS’05] [Eeckhout et al. IISWC’05] • Analytical Modeling [Noonburg et al., MICRO’94] [Karkhanis et al., ISCA’04] • Speedup Simulation[Schnarr et al., ASPLOS’98] [Loh et al., SIGMETRICS’01]

Motivation : Benchmarking Challenges • Using Real-World Applications as Benchmarks Proprietary Nature of Real-World Applications • Single-Point Performance Characterization Application Benchmarks are Rigid • Applications Evolve Faster than Benchmarks Benchmark Suites are Costly to Develop, Maintain, and Upgrade • Studying Commercial Workload Performance Early Design Stage Power/Performance Studies Usefulness of Synthetic Benchmarks Beyond Simulation Time Reduction

Resurgence of Synthetic Benchmarks….. IEEE Computer, August 2003

Workload Synthesis: Central Idea Just 40 workload characteristics

Modeling Real-World Applications Microarchitecture-Independent Workload Profiling Modeling Workload Attributes into Synthetic Workload Experiment Environment Real World Proprietary Workload Workload Profiler Binary Instrumentation OR Simulation Real Hardware Workload Synthesizer Synthetic Benchmark Clone Workload Profile = Workload Attributes + Distribution Of Attribute Values Execution Driven Simulator

Workload Characteristics as ‘Knobs’

Capturing The Essence of Workloads • Attributes to capture inherent workload behavior – Data Locality: Dominant strides of static Load/Store – Control Flow Predictability: Branch transition rate • Modeling Locality & Control Flow Predictability – Data Locality of Integer, Scientific, and Embedded Workloads effectively modeled using circular streams – Replicating transition-rate of static branches

Modeling Data Access Pattern • Identify streams of data references • A Stream? • – Sequence of memory addresses in an arithmetic progression • – Elements of arrays A, B, and C form 3 streams • for( ii = 0; ii < N; ii ++) • A [ii]= B [ii] +C [ii] • 200, 204, 208 .. 320, 324, 328 .. 404, 408, 412 ... • Issuing Sequence :320,404,200,324,408,204…. • Streams are interleaved and may contain noise • 4, 8, 12, 16, 1, 3, 20, 24, 5, 7, 2, 9, 11, 28 …

Extracting Streams • Reference pattern of static Load / Store Instructions – PC-correlated spatial locality -Dependence on address referenced by nearby Ld / St - Programs with pointer chasing codes – PC-correlated temporal locality - Dependence on previous address generated by same Ld / St - Programs with multidimensional arrays • Could static Load / Store instructions be natural sources of streams ? • Profile every static Load / Store instruction –Number of different strides with which it accesses data

Modeling Instruction Level Parallelism Dependency Distance ADD R1, R3,R4 MUL R5,R3,R2 ADD R5,R3,R6 LD R4, (R1) SUB R8,R2,R1 Read After Write Dependency Distance = 3 Measure Distribution of Dependency Distances Upto 1, Upto 2, Upto 4, Upto 8, Upto 16, Upto 32, >32

Modeling Control Flow Predictability • Capture behavior of easy and difficult to predict branches • Inherent program feature that captures branch behavior • Transition Rate [ Haungs et al. HPCA’00 ] # of Taken-Not Taken transitions / # of times executed • Branches with low transition-rate (easier to predict) TTTTTTTTTN, NNNNNNNNNT • Branches with high transition-rate (easier to predict) TNTNTNTNTN • Branches with moderate transition-rate (tougher to predict)

Workload Synthesis (1) Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities A B 1 Big Loop A D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A BR 0.1 B 0.9 D Workload Profile

Workload Synthesis (2) Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities Memory Access Model (Strides) A B 1 Big Loop A D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A BR 0.1 B 0.9 D Workload Profile

Workload Synthesis (3) Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities Memory Access Model (Strides) A B 1 Big Loop A D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A Branching Model – Based on Transition Rate BR 0.1 B 0.9 D Workload Profile

Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities A BR 0.8 0.2 B C BR BR 1.0 1.0 D BR 0.1 0.9 Workload Synthesis (4) Memory Access Model (Strides) A B 1 Big Loop D A B D Synthetic Clone Generation A C D A Branching Model – Based on Transition Rate B D Workload Profile Register Assignment C code with asm & volatile constructs

Evaluation of BenchMaker • SPEC CPU2000, SPECjbb2005, and DBT2 workloads • Validated Sim-Alpha Performance Model of Alpha 21264

Performance Correlation Trade Accuracy for Flexibility – Average Error of 11%

Energy/Power Correlation Average Error of 13%

Altering Individual Program Characteristics

Interaction of Program Characteristics

Modeling Impact of Benchmark Drift Increase in Code Footprint (hypothetical) Increase in Data Footprint from SPEC CPU95 to SPEC CPU2000 for gcc (Model with 7% accuracy)

Summary • Synthetic Benchmarks to Address Benchmarking Challenges • Constructing Synthetic Benchmarks from Hardware-Independent Characteristics • Applications of Synthetic Benchmarks - Altering Program Characteristics - Studying Interaction of Program Characteristics - Modeling Benchmark Drift

Questions? Ajay’s email: ajoshi@ece.utexas.edu

The Return of Synthetic Benchmarks

The Return of Synthetic Benchmarks

Presentation Transcript

BENCHMARKS

Return of the Cicadas

Benchmarks of Quality

The Return of Agamemnon

Benchmarks

State of the Benchmarks

Return of the Rangers

Return of the King

Return of the Jedi

Benchmarks

Benchmarks

The Return of Synthetic Benchmarks

The return of translation

Return of the Native

The derivation of benchmarks

Benchmarks

Benchmarks

Benchmarks

THE RETURN OF LENIN

Return Of The High-Return Tournament