On Tuning Microarchitecture for Programs

On Tuning Microarchitecture for Programs Daniel Crowell, Wenbin Fang, and Evan Samanas

Outline Goal: Adapt µArch to meet program’s performance/energy requirement during runtime • Motivation • A flexible framework for µArch adaptivity • Feasibility study on different adaptive components. • Case study on adaptive cache (selective-way/set) • Evaluation on adaptive cache • Conclusion

Motivation • Optimizing for all is optimizing for nothing • Software is more and more complex, and many are close source • S/W and H/W codesign is infeasible for legacy software

Three Questions for Microarchitecture Adaptivity • When to adapt? => Policy • Interval? Context switch? Function boundary? • What goal(s)? => Policy • Performance first? Performance-power ratio first? • How to adapt? => Mechanism • What technique to use to allow reconfiguration during runtime? Reference: Lee and Brooks [1], and Albonesi et al. [7]

Adaptivity Framework Reference: Lee and Brooks [1] and Albonesi et al. [7]

Policy • Instruction 1: adapt_advise • Inspired from “madvise” in os system calls • When to adapt：when this instruction is executed • What goal: an operand (performance? energy? both?) • Instruction 2: adapt_setup • Privilleged, only used by OS • Operand: allowed user programs to use adapt_advise or not Reference: Ipek [5], and Clark [6] Adding new instructions to SimpleScalar: http://ce.et.tudelft.nl/~demid/SSIAT/

Policy Application boundary (OS) [3] Time interval (OS) [1][2] Context switching (OS) [4] User program (Compiler / User program)

Feasibility study • Back up motivation: What should be configured? • Ideal configuration differs by workload • L1 Data Cache, TLB, Branch Predictor • Simplescalar, Wattch • 6 Programs from SPEC2000Int

Feasibility study (TLB cont.)

Feasibility study (TLB)

Feasibility Study (Branch Predictor)

Feasibility Study (Cache)

What We Learned • TLB • Variability with # entries • Fully-associative better • Branch Predictor: Combined better • Cache: Variability in both • Size Variability > Assoc. Variability • Cache most interesting • Lots of Literature

Selective set (Yang et. al. 2001) • Adjust size (# of sets) of L1 cache • Double size • Shrink by half • Goal: Decrease static power by reducing leakage • Adjust by miss rate threshold • Size-bound • Focus on I-Cache

Selective way (Albonesi 1999) • Disables “unneeded” cache ways • Reduces cache switching activity • When to disable: Extend ISA? • When to enable: Performance Degradation Threshold

Evaluation • Simulator • SimpleScalar 3.0 • Wattch • Workload • 6 programs from SPEC 2000 • Case study: Adaptive Cache 17

SimpleScalar changes Two methods used: • Simplescalar implementation of Selective Sets • Used timer with miss counter to determine sets to disable • Power down portions of cache and selectively flush dirty data • Scripting based method • can use this same design for both selective sets and selective ways • Completely replaces cache when resized, flushes all values at each interval 18

Application-boundary policy Configuration set at start of program, then remains unchanged Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache • Instructions Per Cycle vs Energy Delay • IPC: considers only performance (higher better) • Energy Delay: considers both performance and power (lower better) • Smaller cache size • Energy delay decreases at first, but rises later • Want to choose point where it is smallest 19

Application-boundary policy Selective-way Cache • Similar tradeoffs in IPC and Power to Selective Set • Fewer choices • simplescalar limits to power of two associatively • Unlike cache set size, power of two limit not normally necessary 20

Time-interval policy • Reconfigurations occur every so many CPU cycles • Why? • Good if program behavior not known before execution • Program may require fewer/more cached data later in execution • For our cache study: Relies on % Cache misses to determine reconfiguration. • Performance hit to changing too frequently • May oscillate between two roughly equivalent states • Reconfiguration requires temporarily halting, possibly flushing values from cache 21

Time-interval policy Cache miss rate Cache miss rate Selective-set Cache • What is the minimum allowed cache miss rate? (1%, 2%, 3%, 4%? – policy choice) • Notice positive energy delay on right graph (not good!) • – never resizes down, since miss rate always higher than 1% • So all adaptivity adds is overhead under those circumstances 22

Time-interval policy Cache miss rate Cache miss rate Selective-way Cache • Again, similar to selective sets • Differences dependent upon program being executed 23

Cache miss rate Decreasing number of ways or sets almost always increases miss rate Problem Mentioned Earlier: See how Gzip and Vpr are always higher than 1%, which does not work well with a < 1% dynamic reconfiguration level 24

Conclusion • Adaptivity is useful • Tune for different program requirements • Save power • A flexible adaptivity framework • Mechanism • Policy • Cache just one of many areas where this is useful 25

Reference [1] B. C. Lee and D. Brooks. Efficiency trends and limits from comprehensive microarchitecturaladaptivity. In ASPLOS, 2008. [2] S.-H. Yang, M. D. Powell, B. Falsa, K. Roy, and T. Vijaykumar. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance i-caches. In HPCA, 2001. [3] D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. In JILP, 2000. [4] M. C. Huang, J. Renau, and J. Torrellas. Positional adaptation of processors: application to energy reduction. In ISCA, 2003. [5] E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: accommodating software diversity in chip multiprocessors. In ISCA, 2007. [6] M. Clark and L. K. John. Performance evaluation of congurable hardware features on the amd-k5. In ICCD, 1999. [7] D. H. Albonesi, R. Balasubramonian, S. G. Dropsho, S. Dwarkadas, E. G. Friedman, M. C. Huang, V. Kursun, G. Magklis, M. L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. W. Cook, and S. E. Schuster. Dynamically tuning processor resources with adaptive processing. In Computer, 2003

Question?

On Tuning Microarchitecture for Programs