1 / 26

Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J. Li

Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J. Lilja Electrical and Computer Engineering University of Minnesota lilja@umn.edu. The Problem. Benchmark programs. Heaps o’ data 445 446 397 226 388 3445 188 1002

donny
Download Presentation

Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J. Li

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J. Lilja Electrical and Computer Engineering University of Minnesota lilja@umn.edu

  2. The Problem Benchmark programs Heaps o’ data 445 446 397 226 388 3445 188 1002 47762 432 54 12 98 345 2245 8839 77492 472 565 999 1 34 882 545 4022 827 572 597 364 … • We can generate heaps of data • But it’s noisy • Too much to understand or use efficiently

  3. A Solution • Statistical design of experiments techniques • Compress complex benchmark results • Exploit the results in interesting ways • Extract new insights • Demonstrate using • Microarchitecture-aware floorplanning • Benchmark classification

  4. Why Do We Need Statistics? • Draw meaningful conclusions in the presence of noisy measurements • Noise filtering • Aggregate data into meaningful information • Data compression Heaps o’ data 445 446 397 226 388 3445 188 1002 47762 432 54 12 98 345 2245 8839 77492 472 565 999 1 34 882 545 4022 827 572 597 364 …

  5. Why Do We Need Statistics? • Draw meaningful conclusions in the presence of noisy measurements • Noise filtering • Aggregate data into meaningful information • Data compression Heaps o’ data 445 446 397 226 388 3445 188 1002 47762 432 54 12 98 345 2245 8839 77492 472 565 999 1 34 882 545 4022 827 572 597 364 …

  6. Design of Experiments for Data Compression • Effects of each input • A, B, C • Effects of interactions • AB, AC, BC, ABC 445 446 397 226 388 3445 188 1002 47762 432 54 12 98 345 2245 8839 77492 472 565 999 1 34 882 545 4022 827 572 597 364 …

  7. Types of Designs of Experiments • Full factorial design with replication • O(vm) experiments = O(43) • Fractional factorial designs • O(2m) experiments = O(23) • Multifactorial design (P&B) • O(m) experiments = O(3) • Main effects only – no interactions • m-factor resolution x designs • k O(2m) experiments = kO(23) • Selected interactions

  8. Example: Architecture-Aware Floor-Planner V. Nookala, S. Sapatnekar, D. Lilja, DAC’05.

  9. Layout wire Motivation • Imbalance between device and wire delays • Global wire delays > system clock cycle in nanometer technology

  10. Layout FF wire Solution • Wire-pipelining • If delay > a clock cycle → insert flip-flops along a wire • Several methods for optimal FF insertion on a wire • Li et al. [DATE 02] • Cocchini et al. [ICCAD 02] • Hassoun et al. [ICCAD 02] • But what about the performance impact of the pipeline delays?

  11. Impact on Performance Execution time = num-instr * cycles/instr (CPI) * cycle-time Wire-pipelining

  12. Impact on Performance Execution time = num-instr * cycles/instr (CPI) * cycle-time • Key idea • Some buses are critical • Some can be freely pipelined without (much) penalty Wire-pipelining

  13. Change Objective Function Execution time = num-instr * cycles/instr (CPI) * cycle-time • Traditional physical design objectives • Minimize area, total wire length, etc. • New objective • Optimize only throughput critical wires to maximize overall performance Wire-pipelining

  14. Conventional Microarchitecture Interaction with Floor Planner Simulation Methodology µ-arch Benchmarks CPI info Physical Design Frequency

  15. Microarchitecture-aware Physical Design Simulation Methodology µ-arch Benchmarks CPI info Physical Design Frequency Layout • Incorporate wire-pipelining models into the simulator • Extra pipeline stages in processor • Simulator needs to adjust operation latencies

  16. But There are Problems Simulation Methodology µ-arch Benchmarks CPI info Physical Design Frequency Layout • Simulation is too slow • 2000-3000 instructions per simulated instruction • Numerous benchmark programs to consider • Exponential search space • Thousands of combinations tried in physical design step

  17. # Simulations is linear in the number of buses (if no interactions) Design of Experiments Methodology Design of Experiments based Simulation Methodology µ-arch benchmarks MinneSPEC Reduced input sets Bus, interaction weights benchmarks Floorplanning Validation Frequency Layout

  18. Related Floorplanning Work • Simulated Annealing (SA) • CPI look up table [Liao et al, DAC 04] • Bus access ratios from simulation profiles • Minimize the weighted sum of bus latencies [Ekpanyapong et al, DAC 04] • Throughput sensitivity models for a selected few critical paths • Limited sampling for a large solution space [Jagannathan et al, ASPDAC 05] • Our approach • Design of experiments to identify criticality of each bus

  19. Microarchitecture and factors IADD1 • 22 buses → 19 factors in experimental design • Some factors model multiple buses IADD2 RUU IADD3 Fetch Decode REG IMULT LSQ BPRED FADD DL1 DTLB IL1 FMULT ITLB L2

  20. 2-level Resolution III Design • 2-levels for each factor • Lowest and highest possible values (range) • Latency range of buses • Min = 0 • Max = Chip corner-corner wire latency • 19 factors  32 simulations (nearest power of 2) • Captured by a design matrix (32x19) • 32 rows - 32 simulations • 19 columns - Factor values

  21. Experimental setup • Nine SPEC 2000 benchmarks • MinneSPEC reduced input sets • SimpleScalar simulator • Floorplanner -- PARQUET • Simulated annealing based • Objective function Minimize the weighted sum of bus latencies • Secondarily minimize aspect ratio and area

  22. Comparisons

  23. Typical Results for Single Benchmark

  24. Averaged Over All Benchmarks • Compared to acc • 3-7% point improvement • Better improvements over acc at higher frequencies • SFP-comb≈ SFP (within about 1-3% points)

  25. Summary • Use statistical design of experiments • Compress benchmark data into critical bus weights • Used by microarchitecture-aware floorplanner • Optimizes insertion of pipeline delays on wires to maximize performance • Extend methodology for other critical objectives • Power consumption • Heat distribution

  26. Collaborators and Funders • Vidyasagar Nookala • Joshua J. Yi • Sachin Sapatnekar • Semiconductor Research Corporation (SRC) • Intel • IBM

More Related