On Tuning Microarchitecture for Programs

On Tuning Microarchitecture for Programs Daniel Crowell, Wenbin Fang, and Evan Samanas

Outline • Adapt µArch to meet program’s performance/energy requirement during runtime • A flexible framework for µArch adaptivity • Case study on adaptive cache (selective-way/set) • Evaluation on adaptive cache

Motivation • Optimizing for all is optimizing for nothing • Software is more and more complex, and many are close source • S/W and H/W codesign is infeasible for legacy software

Three Questions for Microarchitecture Adaptivity • When to adapt? => Policy • Interval? Context switch? Function boundary? • What goal(s)? => Policy • Performance first? Performance-power ratio first? • How to adapt? => Mechanism • What technique to use to allow reconfiguration during runtime? Reference: Lee and Brooks [1], and Albonesiet al. [7]

Adaptivity Framework Reference: Lee and Brooks [1] and Albonesiet al. [7]

Policy • Instruction 1: adapt_advise • Inspired from “madvise” in os system calls • When to adapt：when this instruction is executed • What goal: an operand (performance? energy? both?) • Instruction 2: adapt_setup • Privilleged, only used by OS • Operand: allowed user programs to use adapt_advise or not Reference: Ipek [5], and Clark [6] Adding new instructions to SimpleScalar: http://ce.et.tudelft.nl/~demid/SSIAT/

Policy Application boundary (OS) [3] Time interval (OS) [1][2] Context switching (OS) [4] User program (Compiler / User program)

Feasibility study • To back our motivation to do this project • To support our decision of doing case study on adaptive cache, rather than other components • Wait for evan’s figure

Feasibility study (Cont.) • You may have more figures to show …

Case study: Adaptive Cache • According to our experimental result, we find cache is more interesting than other components …

Selective set • What is selective set (may need more than one slide)

Selective way • What is selective way (may need more than one slide)

Selective set vs Selective way • Pros and cons?

Evaluation • Simulator • SimpleScalar 3.0 • Wattch • Workload • 6 programs from SPEC 2000 • Case study: Adaptive Cache 14

SimpleScalar changes Two methods used: • Simplescalar implementation of Selective Sets • Used timer with miss counter to determine sets to disable • Power down portions of cache and selectively flush dirty data • Scripting based method • can use this same design for both selective sets and selective ways • Completely replaces cache when resized, flushes all values at each interval 15

Application-boundary policy Configuration set at start of program, then remains unchanged Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache Selective-set Cache • Instructions Per Cycle vs Energy Delay • IPC: considers only performance (higher better) • Energy Delay: considers both performance and power (lower better) • Smaller cache size • Energy delay decreases at first, but rises later • Want to choose point where it is smallest 16

Application-boundary policy Selective-way Cache • Similar tradeoffs in IPC and Power to Selective Set • Fewer choices • simplescalar limits to power of two associatively • Unlike cache set size, power of two limit not normally necessary 17

Time-interval policy • Reconfigurations occur every so many CPU cycles • Why? • Good if program behavior not known before execution • Program may require fewer/more cached data later in execution • For our cache study: Relies on % Cache misses to determine reconfiguration. • Performance hit to changing too frequently • May oscillate between two roughly equivalent states • Reconfiguration requires temporarily halting, possibly flushing values from cache 18

Time-interval policy Cache miss rate Cache miss rate Selective-set Cache • What is the minimum allowed cache miss rate? (1%, 2%, 3%, 4%? – policy choice) • Notice positive energy delay on right graph (not good!) • – never resizes down, since miss rate always higher than 1% • So all adaptivity adds is overhead under those circumstances 19

Time-interval policy Cache miss rate Cache miss rate Selective-way Cache • Again, similar to selective sets • Differences dependent upon program being executed 20

Cache miss rate Decreasing number of ways or sets almost always increases miss rate Problem Mentioned Earlier: See how Gzip and Vpr are always higher than 1%, which does not work well with a < 1% dynamic reconfiguration level 21

Conclusion • Adaptivity is useful • Tune for different program requirements • Save power • A flexible adaptivity framework • Mechanism • Policy • Cache just one of many areas where this is useful 22

Reference [1] B. C. Lee and D. Brooks. Efficiency trends and limits from comprehensive microarchitecturaladaptivity. In ASPLOS, 2008. [2] S.-H. Yang, M. D. Powell, B. Falsa, K. Roy, and T. Vijaykumar. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance i-caches. In HPCA, 2001. [3] D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. In JILP, 2000. [4] M. C. Huang, J. Renau, and J. Torrellas. Positional adaptation of processors: application to energy reduction. In ISCA, 2003. [5] E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: accommodating software diversity in chip multiprocessors. In ISCA, 2007. [6] M. Clark and L. K. John. Performance evaluation of congurable hardware features on the amd-k5. In ICCD, 1999. [7] D. H. Albonesi, R. Balasubramonian, S. G. Dropsho, S. Dwarkadas, E. G. Friedman, M. C. Huang, V. Kursun, G. Magklis, M. L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. W. Cook, and S. E. Schuster. Dynamically tuning processor resources with adaptive processing. In Computer, 2003

Question?

On Tuning Microarchitecture for Programs