1 / 19

Cache Miss-Aware Dynamic Stack Allocation

This study presents a novel Dynamic Stack Allocator (DSA) with Cache Miss Predictor (CMP) and Stack Pointer Manager (SPM) to reduce cache misses in low-power embedded systems. The CMP computes cache miss probabilities, while SPM selects stack pointer locations with the lowest cache miss probability. Results show a significant reduction in data traffic with the implemented DSA.

coreyw
Download Presentation

Cache Miss-Aware Dynamic Stack Allocation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems(ISCAS), 2007 Presenter: Tareq Hasan Khan ID: 11083577 ECE, U of S Literature review-4 (EE 800)

  2. Outline • Introduction to Cache and Stack • Proposed Dynamic Stack Allocator • Cache Miss Predictor • Stack Pointer Manager • Results • Conclusion

  3. Introduction • Cache • A small and high-speed on-chip memory • Bridges the speed gap between microprocessor and main memory • It is necessary to reduce cache misses without increasing cache associativity for low-power embedded systems • Stack • A group of memory location used for local variables, temporary data of an application or return location of function calls • Last in First Out (LIFO) structure • Half (49%) of memory access related to stack

  4. Dynamic Stack Allocator • Conventional stack allocation is a method that inserts and extracts data sequentially withoutthe consideration of cache misses • Proposed hardware - Dynamic Stack Allocator (DSA) • Cache Miss Predictor (CMP) • computes a cache miss probability at each cache line using the history of cache misses • Stack Pointer Manager (SPM) • select a location for the stack pointer that has the lowest cache miss probability

  5. Dynamic Stack Allocator

  6. Outline • Introduction to Cache and Stack • Proposed Dynamic Stack Allocator • Cache Miss Predictor • Stack Pointer Manager • Results • Conclusion

  7. Cache Miss Predictor (CMP) • Cache Miss Controller (CMC) • Cache Miss (CM) buffer • Consists of “index” and “count” register pairs

  8. Cache Miss Controller (CMC) • Cache controller detects cache misses through comparing the tags in the cache with tag bits of address requested by the processor. • When a cache miss is detected, the cache controller transfers cache miss signal to notify CMP that cache miss has occurred and an index of missing line is also supplied. • On a cache miss, the index is saved at CM buffer and its corresponding counter is incremented by the CMC. • When the CM buffer is full, an entry is replaced according to the interval-based LRUpolicy

  9. Cache Miss (CM) buffer • Recent CM buffer (RCM buffer) • History CM buffer (HCM buffer)

  10. Cache Miss (CM) buffer • Recent CM buffer (RCM buffer) • On a cache miss to cache line k, an associative lookup into the RCM buffer is performed using k. If there is an entry with index k, then the counter for the line k is incremented. • However, if no match occurs and the RCM buffer is not full, the index is recorded in one of the empty lines and the corresponding counter is incremented. • History CM buffer (HCM buffer) • When the RCM buffer is full, the HCM buffer is replaced with the contents of the RCM buffer according to the LRU policy. The indices in the HCM buffer are replaced with the indices in the RCM buffer with a larger value. • In the interval-based LRU policy, the comparison for the replacement doesn’t occur until the RCM buffer is full.

  11. Outline • Introduction to Cache and Stack • Proposed Dynamic Stack Allocator • Cache Miss Predictor • Stack Pointer Manager • Results • Conclusion

  12. Stack Pointer Manager (SPM) When an application requires a stack, the SPM looks for a location that has the lowest cache miss probability using the contents of the RCM and HCM buffer

  13. Stack Pointer Manager (SPM) • When a function is called, the SPM calculates the total cache miss probability within the searching window (R1, R2) of each sub-stack. • To calculate the total cache miss probability, • SPM looks up and down the RCM and HCM buffer to know whether indices included in the searching window exist or not. If it exists, SPM adds the corresponding value to get the total cache miss probability. • After computation, SPM compares the computed probability of a sub-stack with one of other sub-stacks. • Then, SPM dynamically selects a sub-stack that has the lowest cache miss probability as the stack for an application.

  14. Outline • Introduction to Cache and Stack • Proposed Dynamic Stack Allocator • Cache Miss Predictor • Stack Pointer Manager • Results • Conclusion

  15. Result Implemented within the OpenRISC 1200 microprocessor with 8KB direct-mapped data cache and 8KB direct-mapped instruction cache, each with 16-byte line size • The amount of data traffic between cache and main memory according to the size of the RCM and HCM buffer, where the traffic is normalized to one for conventional • The amount of traffic of FFT is 42% smaller than one of the conventional scheme. • Some cases, traffic increases, e.g., DFT with the DSA configurations of RCM(5) and HCM(8).

  16. Result…cont. • Variation of the amount of data traffic according to the number of sub-stacks. • In all cases, the more the number of sub-stack is, the smaller the amount of traffic. But not a very significant improvement.

  17. Result…cont. • ASIC implementation of DSA was done • The maximum speed was 87MHz • The size of DSA is 0.3mm X 0.4mm which is about 1% of total core area

  18. Conclusion • Proposed a hardware for cache miss-aware dynamic stack allocationto reduce cache misses • Based on the history of cache misses, the proposed scheme controls the stack pointer to a location expected to cause smaller cache misses. • In various benchmarks, it was shown that traffic between cache and main memory was reduced by DSA from 4% to 42%.

  19. Thanks

More Related