1 / 22

Data Cache Prefetching using a Global History Buffer

Data Cache Prefetching using a Global History Buffer. Written by: - Kyle Nesbit - James Smith Department of Electrical and Computer Engineering University of Wisconsin, Madison. Presented by: Chuck (Chengyan) Zhao Mar 30, 2004. Introduction Cache-hierarchy:

anson
Download Presentation

Data Cache Prefetching using a Global History Buffer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Cache Prefetching using a Global History Buffer • Written by: • - Kyle Nesbit • - James Smith • Department of Electrical and Computer Engineering • University of Wisconsin, Madison Presented by: Chuck (Chengyan) Zhao Mar 30, 2004

  2. Introduction • Cache-hierarchy: • CPU: registers, very small number, fastest • L1 Cache: usually 8k, larger than CPU registers, slower than CPU • L2 Cache: usually 256/512k, larger than L1, slower than L1 • L3 Cache (optional): usually 1M/2M, larger than L2, slower than L2 Cache • Main memory: • Usually 256M/512 M or more, • larger than L3, slowest CPU-Memory Cache Hierarchy

  3. Each level on cache hierarchy: • latency is around 10 times • Problem with the cache hierarchy architecture • limited capacity (size) • Limited associativity Solution for the problems: using effective prefetching 2. Pre-fetching technique • Sequential prefetching • What: access cache lines that immediately following the current cache line (for the cache miss) • Algorithm: • early: pre-fetch after each cache miss • mature: Issue prefetch after a sequential access pattern is built Degree of prefetching: • Maximum number of cache lines prefetched in response to a single prefetch request • in order to: completely hide the latency of a miss to main memory

  4. 2. Table based prefetching: • What: • record history information related to data access • Operate: • Table is accessed with a key (Program Counter of the load instruction, or the missed address) • Use history information to predict the prefetching behavior • Evaluate: • Pro: simple • Con: inefficient • Fixed amount of history for each prefetching key • Stale happens: data in entry sit for a very long time. When using this information, the memory access behavior has changed 3. Global History Buffer (GHB) prefetching • Organized: Fig 1.b • Features: • FIFO Table: cache misses: enter from bottom, goes up to top • Separate IT and GHB: • Fixed table size: • Circular table: overwrite existing items, when overflow happens

  5. Benefit of GHB: • reduce stale data • more accurate construction of history access patterns • more effective prefetching algorithm • 4. Table-based prefetching techniques • Stride Prefetching: Fig. 2. • the following addresses are fetched: • a + s, where: a: target address • a + 2s, s: detected stride • … … d: degree of prefetching • a + d s, note: in this case, stride s is a const • Correlation Prefetching (Markov Prefetching): Fig. 3. + explain • Use a history table to record cache-misses • missing address: index the correlation table • Each entry: • List of addresses that have immediately followed the current miss address • Most recent miss first • Markov graph: • each node: cache miss address • edge: probabilities that source will be immediately followed by target

  6. 3. Distance Prefetching: Fig 4. + explain • Generalized Correlation Prefetching • Use distance (between 2 global miss address) to index correlation table Problems with table-based prefetching: • Table data becomes stale: not used, not refreshed neither • Table entry conflicts: multiple access keys map to the same table entry • Fixed + small history data per entry: Fig 3. 2-piece of history per data item 5. Global History Buffer (GHB) base prefetching: • Table structure: Fig. 1 (b) • IT: Index Table • accessed by key as traditional table-based prefetching • Key: Program Counter, cache missing address or a combination of them • Have pointers to GHB

  7. GHB (cont) • GHB: n-entry FIFO circular table • holding: n most recent misses • each entry: • global miss address • Pointer: chain other GHB entries into address list (access info for the same address) Notions used later: • Prefetching Method: X / Y • X: • PC: Program Counter based indexing • G: global address • Y: • CS: Const Striding • DC: Delta Correlation • AC: Address Correlation • Different combination of X and Y creates different prefetching methods

  8. 2. GHB for Correlation Prefetching • Fig. 5. • Explain: breadth first, shaded area 3. GHB for Stride Prefetching • PC / CS • Use again Fig. 5. to explain (depth 1st) 6. Global History Buffer (GHB) error handling: • error can occur: • how: • when GHB array is over-written • Pointers become obsolete, as of information re-written • Solution: • Use low-order extra bits of a pointer to reference entries • Compare: • (head pointer – ref pointer) > table size, then, it is an error

  9. 7. GHB evaluation • GHB benefits: • FIFO: • first in, first out buffer • naturally gives table space to the most recent history • Separation of IT + GHB buffer: • IT: Indexing Table • Hold working set of prefetching list • Relatively small • GHB: • Larger than IT • Sized to hold missed address stream • Benefit of this design: • Enable more sophisticated prefetching methods (show later) • GHB drawback: • Multiple access on collecting prefetching info (internal linked-list traversal)

  10. 7. GHB evaluation (cont) 3. Types of GHB prefetching: • Width prefetching: • prefetch only the immediate adjacent nodes • E.g. in Fig. 5 • Depth prefetching: • begin with current miss • Follow with a sequence of most likely node on its path • prefetch at each node • E.g. in Fig 5. • Hybrid: • Mix of the width prefetch and depth prefetch 4. New prefetching technique: Global / Delta Correlation • what: non-const step prefetching • Example: Table 1 • Pattern: {0, 1, 1, 62, 1, 1, …}, access 1st 3 elements of a 2-dimensional array • Const stride: prefetching down to incorrect addresses {1, 1, 1, 1, …}

  11. Non-const address stream

  12. 4. New prefetching technique: (cont) Using GHB: • Sequence of the load’s missing addresses • Detecting variable stride steps • Use delta pairs (Table-1) to predict 8. Simulation and testing • Simulator + its configuration: • Config: table 4 • Simple Scalar: 3.0 • Other details: • Each access to IT: 1 cycle • Each access to GHB: 1 cycle • Degree of prefetching: 4 • Benchmark under ideal L2 cache: table 2 + table 3 • GHB’s train set • use some benchmarks to decide the optimal table size for • IT • GHB • Table size result: Table 6

  13. 4. GHB Testing: Global / Delta Correlation

  14. 5. GHB Testing: PC / Local Prefetching • GHB PC / CS, GHB PC / DC with table-based PC /CS

  15. Conclusion Global History Buffer based prefetching: • 2-level table hierarchy: • IT: Index table • GHB: Global History Buffer • Performance improvements: • Generally: as well as or better than on 14 out of 15 tested benchmarks • Increase IPC • Reduce memory traffic • Advantage: • Reduce stale data • Increase prediction accuracy • Reduce memory traffic • Enable further predicting opportunity: variable step striding • Disadvantage: • Multiple table access on building history information • but, extra delay is relatively small and tolerable

More Related