On Tuning Microarchitecture for Programs

On Tuning Microarchitecture for Programs Daniel Crowell, Wenbin Fang, and Evan Samanas

Summary • A flexible framework for microarchitecture adaptivity, which separates software policies from hardware mechanism • Case study: adaptive cache • Evaluation: SimpleScalar / Wattch / SPEC2000 / User program • Conclusion: Microarchitecture adaptivity is awesome, and our framework is awesome too

Outline • Motivation • Adaptivity Framework • Case study: Adaptive Cache • Evaluation • Conclusion

Motivation • Optimizing for all is optimizing for nothing • Software is more and more complex, and many are close source • S/W and H/W codesign is infeasible for legacy software

One size doesn’t fit all • Show the cache result from our primitive benchmarking • To back our motivation to do this project • To support our decision of doing case study on adaptive cache, rather than other components

Three Questions for Microarchitecture Adaptivity • When to adapt? => Policy • Interval? Context switch? Function boundary? • What goal(s)? => Policy • Performance first? Performance-power ratio first? • How to adapt? => Mechanism • E.g., parameters of cache include block size, # of blocks, # of sets, replacement algorithm, …

Adaptivity Framework

Mechanism • Basically, this is to list some related work on adaptivity, e.g., adaptive cache, adaptive TLB, adaptive processor, … • And list some interesting findings during the course of this project, if we make any progress …

Policy • Instruction 1: adapt_advise • Inspired from “madvise” in os system calls • Used in software: OS, compiler, user programs • Operand: performance first or performance-power ratio first • Instruction 2: adapt_setup • Privilleged, only used by OS • Operand: allowed user programs to use adapt_advise or not

Policy • [OS] Interval / Predicted Interval • [OS] Context switch / Application boundary • [Compiler] Function boundary • [User] User program

Case study: Adaptive Cache • According to our experimental result, we find cache is more interesting than other components …

Selective set VS Selective way • Why do we want to do selective set? • Any interesting

Implementation detail • Hopefully we can put a block diagram here, making it look more professional in architecture area.

Evaluation • Simulator • SimpleScalar 3.0 • Wattch • Workload • 6 programs from SPEC 2000 • 3 microbenchmark programs • Case study: Adaptive Cache

Microbenchmark • Hong-Tai Chou, David J. DeWitt: An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica 1(3): 311-336 (1986). Six data access patterns: • Straight Sequential (SS) References • Clustered Sequential (CS) References • Looping Sequential (LS) References • Independent Random (IR) References • Clustered Random (CR) References • Looping Hierarchical (LH) References

Mechanism • Use 3 microbenchmarkprograms and 6 programs from SPEC 2000 • Use simple policy: e.g., application boundary • Show effectiveness of adaptive cache • Figure 1: bar chart on performance • Figure 2: bar chart on performance-power ratio

Policy • Use 3 microbenchmarkprograms • Don’t use SPEC2000, due to some limitations, e.g., superscalar doesn’t support multi-process • Use idealistic mechanism: best configuration • Show the flexibility of software policies • Figure 1: bar chart on performance [x-axis: policies; y-axis: normalized performance] • Figure 2: bar chart performance-power ratio [x-axis: policies; y-axis: normalized performance-power ratio]

Mechanism + Policy • If time is allowed, think of this part to make this project complete.

Conclusion • Adaptivity is useful • A flexible adaptivity framework • Mechanism • Policy

On Tuning Microarchitecture for Programs