240 likes | 380 Views
Performance Optimization for Low-Leakage Caches based on Sleep-Line Access Density. Reiko Komiya † , Koji Inoue ‡ and Kazuaki Murakami ‡ † Fukuoka University, Japan ‡ Kyushu University, Japan. Outline. Introduction Leakage energy of cache memory
E N D
Performance Optimization for Low-Leakage Caches based on Sleep-Line Access Density Reiko Komiya †, Koji Inoue ‡ and Kazuaki Murakami ‡ †Fukuoka University, Japan ‡ Kyushu University, Japan ODES-4
Outline • Introduction • Leakage energy of cache memory • Conventional low leakage cache : Cache decay • Problem of cache decay approach • Solution: Always-Active approach • Evaluation • Conclusions ODES-4
Power Analysis of ARM920T Static Pwr Dynamic Pwr Cache energy is 44% Introduction Energy consumption = Dynamic energy + Static energy consumed by charging & discharging by leakage current The breakdown of energy consumption in a processor family*1 Leakage energy increases with the progress of process technology Cache leakage reduction is very important!! *1 Fred Pollack (Intel Fellow): New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies [Micro32] *2 Simon Segars, “Low Power Design Techniques for Microprocessors,” ISSCC2001 ODES-4
Conventional Low-Leakage Cache Conventional low-leakage cache: Cache decay Conventional cache doesn’t support any leakage reduction technique Active mode (high-leakage to preserve the data) Sleep mode (destroy the data to reduce leakage) Sleep-miss (degrades processor performance) The mode of each line transits based on this state transition diagram sleep mode active mode access(miss) (high-leakage) (low-leakage) no-access time≧decay itnerval initial state ODES-4
Performance Impact of Sleep-misses Many sleep-misses cause large performance degradation! ODES-4
Our Goal High-performance, low-leakage cache! • Problem of conventional low-leakage cache • Performance degradation caused by sleep-misses • Our approach • To improve performance, reduce sleep-misses • Prohibit some cache lines from going to sleep mode ODES-4
the number of sleep-misses at the cache line i the average number of sleep-misses for all cache lines SMDi = Analysis of Sleep-misses • Sleep-Miss Density (SMD): shows amount of sleep-misses in each line • Example The number of sleep-misses at each cache line Cache lines which often cause sleep-misses have high SMD ! • The total number of sleep-misses: 90 • The number of lines: 9 • ⇒ The average number of sleep-misses • : 10 SMD8=1 SMD7=0.1 SMD6=6 ODES-4
SMD < 1 1 ≦ SMD < 2 2 ≦ SMD < 4 4 ≦ SMD Characteristics of Sleep-misses The breakdown of cache lines in terms of SMD The breakdown of sleep-misses in terms of SMD 3.1% of lines cause 94.4% of sleep-misses Breakdown of lines Breakdown of sleep-miss A small number of high SMD lines often produce sleep-misses ODES-4
Always-Active Approach • Support “Always-Active mode (AA mode)” • AA mode prohibits the corresponding line from going to sleep mode • Cache lines which cause frequently sleep-misses should operate in AA mode • Such lines are called “Always-Active lines (AA lines)” ODES-4
How to Decide AA Lines A line which causes frequently sleep-misses ⇒ AA line SMD at each cache line The number of sleep-misses at each cache line always-active mode SMD ≦ Threshold SMD > Threshold access active mode sleep mode no-access time ≧ decay interval initial state ODES-4
How to Measure SMD Dynamically the number of sleep-misses at the cache line i ① SMDi = the average number of sleep-misses for all cache lines ② > Threshold ③ ① > ②×③ Example)The number of cache lines = 1024 (=210),Threshold = 2 (=21) the total number of sleep-misses 10bit right shift 1bit left shift ② >? ②×③ ① no active mode yes AA mode ODES-4
Hardware Implementation If a line is in sleep mode, Cache decay ⇒tag is in sleep mode AA approach ⇒tag is in active mode Sleep-miss counter Always-active flag Decay flag 2 bit local counter tag data 0 1 The line is in sleep-mode && tag match ⇒a sleep-miss occurs! 2 gated 1023 global counter Vdd or 0V >? > ? = Voltage Control shifter ¼ decay interval total sleep-miss counter ODES-4
Experimental Setup • Evaluation model • Cache decay: conventional low-leakage cache • AA1: Cache decay with AA approach (threshold value=1) • Cache configuration • L1 data cache • Cache size: 32KB • Associativity: 2way • Hit latency: 1 clock cycle • Miss penalty: 32 clock cycles • Evaluation items • Performance improvement • Energy reduction ODES-4
Results AA1 Cache decay Normalized energy Normalized execution time Improve the performance by increasing energy consumption Higher performance and lower energy consumption ODES-4
Conclusions • We have proposed a high-performance, low-leakage cache: AA approach • Detect lines which cause sleep-misses frequently at run time • The performance is improved by operating the line as AA mode • Evaluation results • Higher performance and lower energy consumption • The best case (f183.equake): • Performance degradation: 19% →4.2% • Energy consumption: 20% reduction • Future work • Compare AA approach with an adaptive decay technique (Kaxiras ISCA’00) ODES-4
Thank you ! ありがとう! (in Japanese) ODES-4
AA1 AA2 AA4 Cache decay Impact of Threshold Normalized energy Normalized execution time Threshold is small ⇒ high performance. Because the number of AA lines increase! ODES-4
Breakdown of Energy Consumption AA1 is Cache decay ・Leakage energy increase AA1 ・Dynamic energy accompanying reduce ‐Because the number of sleep-miss reduce Breakdown of energy (J) Energy reduction is tradeoff of DEmemory and LEL1 ODES-4
Performance Impact of Decay Interval Cache decay: Performance improve along with the extension of decay interval AA approach: Even if it uses short decay interval, performance fully improve ODES-4
Energy Impact of Decay Interval Cache decay: Leakage energy increase along with the extension of decay interval AA approach: Leakage reduction is large than cache decay using long decay interval ODES-4
Energy Model(1/3) Etotal = LEL1 + DEL1 + DEmemory LEL1 :L1キャッシュのリーク消費エネルギー DEL1 :L1キャッシュの動的消費エネルギー DEmemory:主記憶アクセス消費エネルギー • LEL1 = {LEbit×Nactive(i)} CC : プログラム実行時間 LEbit : 1クロックサイクルにおける1ビットSRAMセルでの 平均リーク消費エネルギー Nactive(i): i clock cycle時の活性状態SRAMビット数 ☹ ☺ ☺ ☹ ODES-4
消費エネルギー・モデル(2/3) • DEL1 = DE常活性 + DE従来低+ DE従来 DE常活性: 常活性ブロック方式の適用による 動的消費エネルギー・オーバヘッド DE従来低: 従来型低リーク・キャッシュの適用による動的消費エネルギー オーバヘッド DE従来 : 従来型キャッシュでのアクセス消費エネルギー ☹ ☹ ☹ ODES-4
消費エネルギー・モデル(3/3) [1] K.Flautner, N.S.Kim, S.Martin, D.Blaauw, and T.Mudge, “Drowsy Caches: Simple Techniques for Reducing Leakage Power,” Proc. of the 29th Int, Symp. on Computer Architecture, pp.148-157, May 2002. [2] S.Kaxiras, Z.Hu, and M.Martonosi, “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. of the 28th Int, Symp. on Computer Architecture, pp.240-251, June 2001. ODES-4