240 likes | 383 Views
Improved Policies for Drowsy Caches in Embedded Processors. Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu University). Background (1/2). Cache are now used not only in general-purpose processors but also in embedded processors
E N D
Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu University)
Background (1/2) • Cache are now used not only in general-purpose processors but also in embedded processors • Cache memory consumes large amount of energy of processors e.g. In embedded processor ARM920T, 44% of total energy was consumed in cache Reducing energy consumption in cache is effective and important ! S. Segars, “Low Power Design Techniques for Microprocessors,” ISSCC Tutorial , Feb. 2001.
Background (2/2) • Energy consumption in cache • Dynamic energy • Consumed due to switching activity in cache during cache operation • Leakage energy • Consumed while the power of cache is on, no matter the cache is accessed or not • Leakage increases significantly as process feature size shrink • e.g., in 70nm technology, cache leakage occupies up to 70% of total cache energy [Kim02] Reduction of leakage is critical to decrease overall cache energy!
Related Work • Non-state-preserving cache • DRI cache [Yang01] • Resize cache size dynamically according to cache miss ratio • Cache Decay [Kaxiras01] • Power off the cache lines that have not been accessed during a given decay interval • State-preserving cache • Drowsy cache • Lower the voltage of cache lines that may not be accessed in the near future • Original policies (Simple / Noaccess) [Flautner02] • Policies exploiting temporal locality (MRO / TMRO / RMRO) [Petit05]
Awake line(Valid) Awake line (Invalid) Drowsy Policies [Flautner02] • Simple policy • move all cache lines into low leakage mode at regular time window • Noaccess policy • move cache lines into low leakage mode that have not been accessed in the previous time window • Supply voltage is lowered in low leakage (drowsy) mode. • To access low leakage line, it need to change cache line into awake mode. • It needs one or more cycles to change cache line mode.
Drowsy line(Valid) TimeWindow Drowsy line (Invalid) Drowsy Policies [Flautner02] • Simple policy • move all cache lines into low leakage mode at regular time window • Noaccess policy • move the cache lines into low leakage mode that have not been accessed in the previous time window • Supply voltage is lowered in low leakage (drowsy) mode. • To access low leakage line, it need to change cache line into awake mode. • It needs one or more cycles to change cache line mode.
Need to access Drowsy Policies [Flautner02] • Simple policy • move all cache lines into low leakage mode at regular time window • Noaccess policy • move the cache lines into low leakage mode that have not been accessed in the previous time window • Supply voltage is lowered in low leakage (drowsy) mode. • To access low leakage line, it need to change cache line into awake mode. • It needs one or more cycles to change cache line mode.
Transition into awake mode Drowsy Policies [Flautner02] • Simple policy • Transition all cache lines into low leakage mode at regular time window • Noaccess policy • Transition cache lines into low leakage mode that have not been accessed in the previous time window • Supply voltage is been lower in low leakage mode • To access low leakage line, it need to change cache line into awake mode. • It needs 1 or several cycles to change cache line mode.
Drowsy Policies [Petit05] • MRO (Most Recently used On) policy • All lines are changed into Drowsy mode except the MRU line in each cache set • Only one cache line in a cache set is always in awake mode • TMRO (Two Most Recently used On) policy • All lines are changed into Drowsy mode except two MRU line in each cache set • Two lines in each cache set are always in awake mode • RMRO (Reused Most Recently used On) policy • A cache line which is not accessed during the previous time window goes to (or stay in) low leakage mode. • If only a single line in a set is accessed during the previous time window, keep the line awake. • If more than one line in a set is accessed during the previous time window, keep the two MRU lines awake, and put the other lines in low leakage mode.
Contributions of This Work • Propose yet another four policies which try to balance leakage energy and performance and energy overheads due to mode transition. • Evaluate mode transition policies in the context of embedded processors. • Previous work assumes wide-issue out-of-order processors with non-blocking cache, where mode transition cycles can be easily hidden. • This paper assumes single-issue processors with blocking-cache.
Proposed Policies (1/2) • PMRO (Periodic MRO) • Move all cache lines into low leakage mode at a certain time period except for the MRU line in each cache set • PTMRO (Periodic TMRO) • Move all cache lines into low leakage mode at a certain time period except for the two MRU line in each cache set
MRU way of each cache set Time Window Proposed Policies (1/2) • PMRO (Periodic MRO) • Move all cache lines into low leakage mode at fixed window period except for the MRU line in each cache set • PTMRO (Periodic TMRO) • Move all cache lines into low leakage mode at fixed window period except for the two MRU line in each cache set
Leave them awake Time Window Proposed Policies (1/2) • PMRO (Periodic MRO) • Move all cache lines into low leakage mode at fixed window period except for the MRU line in each cache set • PTMRO (Periodic TMRO) • Move all cache lines into low leakage mode at fixed window period except for the two MRU line in each cache set
Conditions for staying in awake mode Proposed Policies (2/2) • AAM (Access And MRU) • All cache lines are put into low leakage mode except for the MRU line that has been accessed in the previous time window • AOM (Access Or MRU) • All cache lines are put into low leakage mode except for the MRU line and the accessed lines in the previous tine window Accessed in previous time window MRU
Conditions for staying in awake mode Proposed Policies (2/2) • AAM (Access And MRU) • All cache lines are put into low leakage mode except for the MRU line that has been accessed in the previous time window • AOM (Access Or MRU) • All cache lines are put into low leakage mode except for the MRU line and the accessed lines in the previous tine window Accessed in previous time window MRU
Experimental Setup (1/2) • Cycle-accurate instruction simulator • SimpleScalar/ARM to generate the access trace • Cache simulator developed in house • Input : access trace • Output: leakage energy and execution time including mode transition overhead • Implemented policies in the simulator for evaluation • Policies not using access history : Simple, MRO, TMRO, PMRO, PTMRO • Policies using access history : Noaccess, RMRO, AAM, AOM • Benchmark programs • MediaBench • Encoding / decording of adpcm, g721 and gsm • Decording of jpeg and mpeg2
Experimental Setup (2/2) • SimpleScalar/ARM configuration • In order, single issue • L1 cache only • Instruction cache size : 8KB • Cache simulator configuration • Cache line size : 32B • Cache size : 16KB / 32KB • Associativity : 2 / 4 / 8 ways • Mode Change Penalty (MCP) : 3 cycles • Time window cycles : 4096 cycles • Policies evaluated • Conventional cache • 5 previous Drowsy policies • 4 proposed Drowsy policies
Comparisons of Policies Not Using Access History 16KB, MCP=3 cycles
In 4 and 8 way cache, ED Product is the lowest in PMRO policy Comparisons of Policies Not Using Access History 16KB, MCP=3 cycles
Comparisons of Policies Not Using Access History 32KB, MCP=3 cycles
Results of Individual Programs 16KB, 4way Not using access history
Comparisons of Policies Using Access History 16KB, MCP=3 cycles
Conclusions • Summary • We have proposed four policies for Drowsy cache. • The Simple and PMRO policies appear promising among those not using access history. • The Noaccess policy is promising among those using access history. • The Drowsy cache is effective not only in high-performance processors but in embedded processors. • Future Work • Apply to Instruction cache • Explore application-specific policy optimization