1 / 36

Reuse-based Online Models for Caches

Reuse-based Online Models for Caches. Rathijit SeN David A. Wood. The Problem. Core. Core. LLC. LLC. Core. Core. LLC. LLC. LLC. LLC. Core. Core. Miss. Fetch. Core. Core. LLC. LLC. DRAM. Caches: power vs performance Reconfigurable caches e.g ., IvyBridge

jeff
Download Presentation

Reuse-based Online Models for Caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reuse-based Online Models for Caches Rathijit SeN David A. Wood ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

  2. The Problem Core Core LLC LLC Core Core LLC LLC LLC LLC Core Core Miss Fetch Core Core LLC LLC DRAM • Caches: power vs performance • Reconfigurable caches • e.g., IvyBridge • The Problem: Which configuration to select? e.g., to get the best energy-efficiency? ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

  3. Cache Performance Prediction ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • We propose a framework h = (r · B) · φ • h: hit ratio • r: reuse-distance distribution (novel hardware support) • B: stochastic Binomial matrix • φ: hit function (LRU, PLRU, RANDOM, NMRU) • Case study: Energy-Delay Product (EDP) within 7% of minimum

  4. Agenda ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • The Problem • Framework • Locality (r) • Matrix transformations (B) • Hit functions (φ) • h = (r · B) · φ • Hardware support • Case Study

  5. Cache Overview Address N Tag Match? Miss Y Hit Associativity (A) Sets (S) ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • Limited storage • Sets of (usually 64-byte) blocks • #blocks/set = associativity (#ways) • Set Index + Address tags identify data

  6. Workload Variation swim mgrid apache zeus oltp jbb equake, gafort, wupwise fma3d ammp, blackscholes,bodytrack, fluidanimate, freqmine, swaptions Last-Level Cache (LLC) ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

  7. Bad configurations hurt! Maximum Minimum 218% worse 27% worse ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA EDP (energy-delay product)

  8. Problem Summary Associativity (A) Sets (S) ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA Reconfigurable caches Multiple replacement policies Goal: Online miss-ratio prediction

  9. Indexing Assumption ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • Mapping of unique addresses to cache sets • Assumption: independent, uniform [Smith, 1978] • Unique accesses as Bernoulli trials • (Partial) Hashing • POWER4, POWER5, POWER6, Xeon • Simple XOR-based function [similar to Cypher, 2008]

  10. Agenda  ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • The Problem • Framework • Locality (r) • Matrix transformations (B) • Hit functions (φ) • h = (r · B) · φ • Hardware support • Case Study

  11. Temporal Locality Metrics Size? i • ■ ■■ ■ … ■ ■ r P(URD=i) ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • Unique Reuse Distance (URD) • #unique intervening addresses • x y z z y x : URD(x)=2 • Stack Distance [Mattson, 1970] – 1 • Large cache  large distances to track • Absolute Reuse Distance (ARD) • #intervening addresses • x y z z y x : ARD(x)=4

  12. Per-set Locality, r(S) #sets: S > S #sets: S  x x i • ■ ■■ ■ … ■ ■ r         x x P(URD=i)      ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • r(S) is “compressed” as S (#sets) increases • Less of the tail is important

  13. Agenda   ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • The Problem • Framework • Locality (r) • Matrix transformations (B) • Hit functions (φ) • h = (r · B) · φ • Hardware support • Case Study

  14. Estimating per-set locality B 0 1 0 0 0 0 0 0 i 0   0 0 0 0 0 • ■ ■■ ■ ■ ■ ■ ■ r    0 i 0 0 0 0     0 0 0 0      0 0 0 P(k successes in i trials) i.e., P(k of i to the same set) P(URD=i)       0 0        0         k ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA Generalized stochastic Binomial matrices [Strum, 1977] r(S) = r(1) ·B(1 – 1/S, 1/S) Composition: r(S) = r(S) ·B(1 – S/S, S/S)

  15. Computation reuse & speedup Poisson Approximation i r(214) r(214) • ■ ■■ ■ … ■ ■ r r(213) r(213) Size? P(URD=i) r(212) r(212) r(1) r(1) r(210) r(211) r(211) Now: compute Later: hardware support  r(210) “Shorter” tail  smaller matrices ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

  16. Size of r(210)? i • ■ ■■ ■ … ■ ■ r P(URD=i) ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA Prediction with r(210) limited to URD < n

  17. Agenda    ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • The Problem • Framework • Locality (r) • Matrix transformations (B) • Hit functions (φ) • h = (r · B) · φ • Hardware support • Case Study

  18. Hit Function, φ        x x Not x φ0 = 1 φk ≤ φk-1 φ= 0 ∞ ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • φk: P(x will hit|URD(x)=k) • Monotonically decreasing model • Intuition: larger URD  same or larger eviction probability

  19. Hit Function, φ ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA Example: A=8

  20. Formulating φ ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • φ(LRU): step-function • (r · B) · φ(LRU)  [Smith, 1978], [Hill & Smith, 1989] • φ(PLRU): • Assumes on average, traffic evenly divided between subtrees • φ(RANDOM): • Estimates #intervening misses using ARD • φ(NMRU): similar to φ(RANDOM) except φ1=1

  21. Agenda     ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • The Problem • Framework • Locality (r) • Matrix transformations (B) • Hit functions (φ) • h = (r · B) · φ • Hardware support • Case Study

  22. Prediction Accuracy ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA LRU, PLRU(A=2), NMRU(A=2): exact per-set model Others: approximate per-set model

  23. Overheads ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • r = r · B : 6  80 μsec • Binomial  Poisson approximation for each row of B • h = (r · B) · φ : 20  30 μsec • Average over 24 configurations • B applied 8 times

  24. Agenda      ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • The Problem • Framework • Locality (r) • Matrix transformations (B) • Hit functions (φ) • h = (r · B) · φ • Hardware support • Case Study

  25. Computation reuse & speedup Poisson Approximation i r(214) r(214) • ■ ■■ ■ … ■ ■ r r(213) r(213) Size=512 P(URD=i) r(212) r(212) r(1) r(1) r(210) r(211) r(211) Now: compute Later: hardware support  Now r(210) “Shorter” tail  smaller matrices ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

  26. Insights i • ■ ■■ ■ … ■ ■ r P(URD=i) ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA x y z z y x : URD(x)=2 • Unique “remember” addresses • Only cardinality, not full addresses Bloom filter for compact (approximate) representation • r(210) is seen by any set of a cache with S=210 • Filter address stream

  27. read access filtered access reset Set Filter Control Logic 9-bit Counter load hit read inc 1024-bit Bloom Filter 2 hash fns Reference address register insert Hardware Support for estimating r(210) Start Sample 512-entry Histogram array Y Addr match? inc N Unique? Y (not hit) Remember End Sample ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

  28. Agenda      + way counters ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • The Problem • Framework • Locality (r) • Matrix transformations (B) • Hit functions (φ) • h = (r · B) · φ • Hardware support • Case Study

  29. LRU Way Counters [Suh, et al. 2002] ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • One counter per logical way (stack position) • Determining logical position is hard • not totally (re-)ordered with every access • heuristics, e.g., for PLRU [Kedzierski, et al. 2010] • Other Limitations • Inclusion property • Fixed #sets • S = S : special case of reuse framework • S  S ? Use B • provided, enough tail of r(S) is available

  30. Min. EDP configuration ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA EDP within 7% of minimum Reuse models outperform PLRU way counters in most cases

  31. Summary ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA • The Problem: Online miss-rate estimation for reconfigurable caches • We propose a framework h = (r · B) · φ • h: hit-ratio • r: reuse-distance distribution (novel hardware support) • B: stochastic Binomial matrix • φ: hit function (LRU, PLRU, RANDOM, NMRU) • Case study: EDP within 7% of minimum • Future work: More policies, applications/case studies

  32. Also in the paper ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA r: lossy summarization of the address trace Estimation for ARD Optimizations for LRU Conditions for PLRU eviction More details on models & evaluation

  33. Reuse-based Online Models for Caches ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA Questions?

  34. Example LLC performance ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA OLTP (TPC-C + IBM DB2)

  35. Estimating cache performance i i i • ■ ■■ ■ … ■ ■ •  …  φ r P(URD=i) P(hit|URD=i) ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA Hit ratio = hits/access ∑ P(URD=i) · P(hit|URD=i) = · Miss ratio = misses/access = 1 – hit ratio Miss rate = misses/instruction = miss ratio x access/instruction

  36. URD vs ARD {z0}* {z0,z1}* {z0,z1,z2}* {z0,z1,z2,...,zk-1}* x x z0 z1 z2 z3 zk-1 dk ∞ dk= dk-1 +1/ri Approximation: k ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

More Related