330 likes | 492 Views
Low Power Cache Design. M.Bilal Paracha Hisham Chowdhury Ali Raza. Acknowlegements. Ching-Long Su and Alvin M Despain from University of Southern California,”Cache Design Trade-offs for Power and Performance Optimization:A Case Study”
E N D
Low Power Cache Design M.Bilal Paracha Hisham Chowdhury Ali Raza
Acknowlegements • Ching-Long Su and Alvin M Despain from University of Southern California,”Cache Design Trade-offs for Power and Performance Optimization:A Case Study” • C.L and Alvin M.Despain “ Cache Designs for Energy and Efficiency” • Zhichun Zhu Xiadong Zhang, College of William and Mary, “Access Mode predictions for low-power cache design” • M. D. Powell and A. Agrawal and T. N. Vijaykumar and B. Falsafi and K. Roy, Reducing Set-Associative Cache Energy via selective Direct –Mapping and Way Prediction.”. MICRO 2001.
Today’s talk • Abstract • Introduction • Use of cache in microprocessors • Different designs to optimize cache energy and power consumption • Design Trade-offs for Power & Performance Optimization • Vertical Cache Partitioning • Horizontal Cache Partitioning • Gray Code Addressing • Set-Associative Cache Energy Reduction • Way Prediction • Selective direct-mapping • Access Mode Prediction (AMP) • Advantages over Way Prediction and Phased cache • Different prediction techniques • Evaluation Results • Cache Access Times • Miss Rates • Cache Energy consumption
Today’s talk…. • Conclusion • Acknowledgements
Abstract • Usage of caches in modern microprocessors. • Caches designed for high performance, ignore power consumption • Research activities towards low power cache design
Introduction • Cache uses 30-60% processor energy in embedded systems • Use of caches in high performance machines • Various designs to optimize energy consumption
Use of cache in microprocessors • High performance products go mobile (Notebooks, PDA’s etc) • Cache’s as temporary storage devices • Design of components with low power consumption
Vertical Cache Partitioning • Block Buffer • Block Hit/Miss • Block Size
Horizontal Cache Partitioning • Cache segments • Cache sub-banks • Reduction cache accesses • Hit time, an advantage
Gray Code Addressing • Gray code vs 2’s compliment • Minimizes bit switches • 2s Compliment:31 bits change • Gray Code:16 bits change
Evaluation Results • <dm,2> A direct mapped cache with block size 2 words • <dm,4> A direct mapped cache with block size 4 words • <dm,8> A direct mapped cache with block size 8 words • <2lru,2> A 2-way set associative cache with block size 2 words • <2lru,4> A 2-way set associative cache with block size 4 words • <2lru,8> A 2-way set associative cache with block size 8 words • <4lru,2> A 4-way set associative cache with block size 2 words • <4lru,4> A 4-way set associative cache with block size 4 words • <4lru,8> A 4-way set associative cache with block size 8 words
Cache Access Time • Takes less time to access direct –mapped than set associative • Cache access of 1K byte for dm=4.79 ns, for set assoc=7.15 ns • 2 way set associative is approx 50% slower than dm cache
Reducing Set Associative Cache Energy Via Way Prediction and Selective Direct mapping
Cache Access Energy Reduction Techniques • Energy Dissipation in Data Array is much larger than in Tag Array so Energy Optimizations in Data Array only are done. • Selective Direct Mapping for D-Caches • Way Prediction for I-Caches
Different Design Techniques a) Conventional Parallel Access
Different Cache accessing mode • Phased Cache: • Compares tag with all the tag in a particular set, If the tag matches only then, it accesses the data • Consumes energy, not efficient Access the set ↓ Access all n tags ↓ Access the data corresponding to the tag
Way Prediction: • Access only the predicted tag and data • Efficient when hit rate is high • Not very efficient when there is a miss (has to access rest of the tag and data elements) Access the set ↓ Way Prediction ↓ Access the predicted data and tag sub array in the set ↓ Prediction Correct Yes ↓→No Compare the rest of the data and tag array Proceed
Access Mode Prediction (AMP) • Prediction based approach • Better to use Way Prediction when hit rate is very high • When hit rate is low, it is preferable to use Phased Cache approach • Predicts whether cache access will result in a hit or a miss. If it predicts a hit then Way prediction is used, other wise use Phased Cache approach • Accuracy of the access mode determines the efficiency of the approach
Power Consumption: • Perfect AMP and perfect Way Prediction has a power consumption which is the lower bound of conventional set associative cache. • predicted hit in the way-prediction cache, the energy consumed is Etag +Edata, compared with n × Etag+ Edata in the phased cache • miss in the way-prediction cache will consume (n + 1) ×Etag + (n + 1) × Edata, in comparison with (n +1) × Etag + Edata in the phased cache.
Different Predictors • Saturating Counter: • Similar to the saturating counter of branch prediction used in project2 • Maintains a two bit counter which increments on a cache hit and decrements on a cache miss • Two-level adaptive predictor: • Adaptive two level branch prediction using global pattern-history table (GAg) • K bit history register records the result of most recent K accesses • For a hit register records a 1, otherwise 0 • This K bit is used to index global pattern history table which has 2^K entries, each entry is a 2 bit saturation counter • Per address two level global pattern history table (PAg) • Each set has its own access history register • All history register index a single history pattern table • Correlation predictor • Gshare predictor: • XOR of global access history with current reference set provides the index for global pattern history table
Conclusion • Cache Designs can be modified to obtain maximum performance and optimal energy consumption • Experiments suggest that • direct-mapped caches (inst and data) consume less energy for dynamic logic • Set Associative consume less energy for static logic • Circuit level techniques can no longer keep power dissipation under a reasonable level. • Reduction of power is done on architectural level. By producing different schemes for reducing on-chip cache power consumption