250 likes | 355 Views
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture. Samira Khan *† , Alaa R. Alameldeen *, Chris Wilkerson*, Jaydeep Kulkarni * and Daniel A. Jiménez § *Intel Labs † Carnegie Mellon University § Texas A&M University. Summary. Problem :
E N D
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*, Chris Wilkerson*, JaydeepKulkarni* and Daniel A. Jiménez§ *Intel Labs †Carnegie Mellon University §Texas A&M University
Summary • Problem: • Cache cells become unreliable at low voltage • Mixed-cell cache: Use some larger robust cells [Ghasemi 2011] • Smaller non-robust cells are turned off at low voltage • Capacity loss leads to performance loss • Goal: • No capacity loss at low voltage to gain high performance • Observation: • A clean line has a duplicate copy in the memory hierarchy • A modified line is the only existing copy • Our Approach: • Protect a modified line in larger robust cells • Store a clean line in smaller non-robust cells • Fetch data from the lower level on an error in a clean line • Significantly improves performance and reduces power compared to prior work
Outline • Summary • Background and Motivation • Mixed-Cell Cache Architecture • Methodology and Results • Conclusion
Background and Motivation • Multi-core designs are power-limited • Can activate more cores by lowering the voltage Voltage Scale More active cores at low voltage
Ensuring Resiliency at Lower Voltage • Cache cells begin to fail at lower voltage Robust Cache Mixed-Cell Cache Non-robust • Mixed-Cell Cache [Ghasemi 2011] • Some ways built with robust cells + Resilient to error at low voltage - Area and power overhead • Only robust cells are operational at low voltage Error • Cache capacity loss at lower voltage • can degrade performancesignificantly
Effect of Cache Capacity Reduction in a 4-Core System • In our experiments, 75% reduction in cache capacity leads to 20% performance loss on average
Goal: Improve performance using the whole cache at low voltage
Outline • Summary • Background and Motivation • Mixed-Cell Cache Architecture • Methodology and Results • Conclusion
Our Mixed-Cell Architecture • Observation: • A Clean line has a duplicate copy in the memory hierarchy • On an error, can get the data from the duplicate copy • A Modified line is the only copy in the system • Criticalto keep the data error free • Idea: • Protect a modified line using larger robust cells • Store a clean line in smaller non-robust cells • Use parity/ECC to detect errors in clean lines • Fetch data from the lower level on an error in clean lines
Our Mixed-Cell Architecture • Use both robust and non-robust ways at low voltage • A modified line is stored only in a robust way • A clean line is stored only in a non-robust way Robust Our Design Mixed-cell (Disable) [Ghasemi 2011] Non-robust Clean Modified Modify cache management techniques to ensure clean and modified lines are stored appropriately
Mixed-Cell Architecture: Cache Miss • Write miss: Allocate line in a robust way • Read miss: Allocate line in a non-robust way X Y A B LRU LRU LRU LRU Write miss X Write miss Y Read miss A Read miss B Time
Mixed-Cell Architecture: Cache Hit J K E L F G H I • Read hit: No change • Write hit: • Write hit in robust: No change • Write hit in non-robust: We propose three mechanisms • Writeback • Swap • Duplicate Write Hit G Read Hit J Write Hit E
Write to a Non-Robust Line: Writeback • Write it back in the next level of memory hierarchy • Make data clean in the non-robust cell Write Hit G J K E L F G H I Now this block contains clean data Dirty block in non-robust way is vulnerable, writeback G + Simple - An extra writeback at each write in a non-robust way
Write to a Non-Robust Line: Swap • Swap modified line with the LRU robust line • Writebackthe robust data to next cache level Write Hit G J K E L F G H I Now this block contains clean data Swap E and G, E is now vulnerable, writeback E + Increases write hits in robust cells - Extra latency for swap
Write to a Non-Robust Line: Duplicate • Pair two non-robust ways • Static pairing: way <0,1>, <2,3>… • Duplicate the data in the partner way • On an error in one way, use data from the partner way Write Hit G G J K E L F G H I , Duplicate G in the partner way + Simple, no extra writeback - Capacity loss, extra latency for duplication
Outline • Summary • Background and Motivation • Mixed-Cell Cache Design • Methodology and Results • Conclusion
Evaluation Methodology • Simulator: CMP$im, a Pin-based x86 simulator [Jaleel 2008] • Benchmarks: 20 4-core multi-programmed mixes from SPEC 2006 • Each cache has 2 robust ways • L1D 32KB, 2 robust, 6 non-robust ways, 3 cycles • L2 256KB, 2 robust, 6 non-robust ways, 10 cycles • L3 shared 4MB, 2 robust, 14 non-robust ways, 25 cycles • Memory latency 80 cycles • Vmin 590 mV, 825 MHz
Comparison Points • Robust: Cache uses only robust cells • Smaller capacity, L1D 20KB, L2 160KB, L3 2.25MB • Disable:Mixed-Cell Cache [Ghasemi 2011] • Only ¼ of the cache works at low voltage, L1D 8KB, L2 64KB, L3 1MB • Ideal: Cache uses only non-robust cells • Larger capacity, L1 40KB, L2 320KB, L3 4.5MB • Can not work at low voltage • Can provide higher voltage to cache using a separate Vcc • Increases complexity • Adds latency to signals crossing voltage domains
4-Core Performance at Low Voltage 2.6% 17% Swap provides 17% speedup over Disable Swap performs within 2.6% of Ideal
Normalized Memory Bandwidth 6.15 21% 28.5% 3% Duplicate increases memory bandwidth by only 3% compared to Ideal
Normalized LLC Static Power at Vmin (590mV) 10% 2.3X Swapand Duplicate reduce LLC static power by 10% compared to Ideal
Normalized L1D Dynamic Power at Vmin (590mV) 22% 50% 30% Duplicate reduces dynamic power by 50% compared to Disable Duplicate is within 30% of the Ideal
Conclusion • Problem: • Cache cells become unreliable at low voltage • Mixed-cell cache: Use some larger robust cells • Smaller non-robust cells are turned off at low voltage • Capacity loss leads to performance loss • Goal: • No capacity loss at low voltage to gain high performance • Observation: • A clean line has a duplicate copy in the memory hierarchy • A modified line is the only existing copy • Our Approach: • Protect a modified line in larger robust cells • Store a clean line in smaller non-robust cells • Fetch data from the lower level on an error in a clean line • Improves performance by 17% and reduces L1D dynamic power by 50% compared to prior work
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*, Chris Wilkerson*, JaydeepKulkarni* and Daniel A. Jiménez§ *Intel Labs †Carnegie Mellon University §Texas A&M University