170 likes | 388 Views
A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems. Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao , Mingyu Chen, Chengyong Wu. Kilmo Choi rlfah926@naver.com. Contents. Background and Motivation Bank-Level Partition Mechanism(BPM)
E N D
A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems Lei Liu, ZehanCui, MingjieXing, YungangBao, Mingyu Chen, ChengyongWu Kilmo Choi rlfah926@naver.com
Contents • Background and Motivation • Bank-Level Partition Mechanism(BPM) • Results • Conclusion • Reference
Background and Motivation • Memory bank - The same set of memory access speed • Multicore platform - Multiple banks can serve memory requests independently and concurrently Bank-Level Parallelism
Background and Motivation • Row buffer conflict • Causes performance degradation(throughput slowdown and unfairness ) • ex. row buffer hit rate decrease from 1 core(over 60%) to 16 core(35%) Core 0 Access in the same page Core 0 Access data in Row 1 Core 1 Access data on Row 3 Core Core Core Core Core Core Core Core Row-buffer Hit Row-buffer Conflict R/W R/W R/W R/W Activate Operation Precharge Operation Activate Operation Row 0 Row 0 Row 0 Row 0 Row 1 Row 1 Row 1 Row 1 Row 2 Row 2 Row 2 Row 2 Row 3 Row 3 Row 3 Row 3
Bank-Level Partition Mechanism(BPM) • Numerous new memory scheduling algorithms have been proposed to address the interference problem • However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers • Overview of BPM • OS memory management system uses a page-coloring mechanism to partition banks into several groups and maps each thread (process) to a specific bank group • Address mapping policy
Bank-Level Partition Mechanism(BPM) row buffer row buffer row buffer row buffer row buffer row buffer row buffer row buffer Core Core Core Core Row 0 Row 0 Row 0 Row 0 Row 0 Row 0 Row 0 Row 0 Row 1 Row 1 Row 1 Row 1 Row 1 Row 1 Row 1 Row 1 Row 2 Row 2 Row 2 Row 2 Row 2 Row 2 Row 2 Row 2 Row 3 Row 3 Row 3 Row 3 Row 3 Row 3 Row 3 Row 3 Bank Bank Bank Bank Bank Bank Bank Bank
Bank-Level Partition Mechanism(BPM) • Advantages • row buffer conflict ↓ row buffer hit ↑ • BPM is entirely software approach Flexible • Easier for OS to monitor thread’s behavior than hardware • Bank-level conflicts can be fully eliminated by exclusively mapping a thread’s data to specific banks • How much influence the performance of thread amount of available bank?
Bank-Level Partition Mechanism(BPM) • Discover bank bit by software method(Algorithm) • (Uncached) • Row{ } • Remain{ } FOR y{FOR x} FOR x y x x 0 0 0 Row hit Row miss 1 1 1 Mapped to different banks Row miss Row 0 Row 0 Row 0 Row 1 Row 1 Row 1 Row 2 Row 2 Row 2 Row 3 Row 3 Row 3 Higher latency Column{ } Left parts BANK{ } Higher latency Row{ } Left parts Remain{ }
Bank-Level Partition Mechanism(BPM) • Advantages • row buffer conflict ↓ row buffer hit ↑ • BPM is entirely software approach Flexible • Easier for OS to monitor thread’s behavior than hardware
Results • Environments • 4 cores, 2.8GHz Intel Core i7-860 processor, 8GB DDR3 main memory with 64banks, 5 bank bits • CentOS Linux 5.4 with kernel 2.6.32.15 • SPEC CPU2006 benchmarks
Results • Overall system performance
Results • Page-Policy and Power
Results • BPM VS Cache-Partition-Only • The correlation between BPM improvements and Per-core bandwidth
Conclusion • BPM is a new approach to eliminate the interference between threads and improve the overall system performance • BPM achieves this goal by assign different group of banks to different threads to eliminate inter-thread bank-level interference • This leads to the reduction of row buffer misses as well as the energy consumption of memory system
Reference • J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In HPCA-14, 2008. • Junghoon Kim, Junghan Kim, YoungikEom. A Page Coloring Scheme through Page Cache Separation for Improving Cache Performance, In NIPA-2010 • DimitrisKaseridis, Jeffrey Stuecheli, LizyKurian John. Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era. In MICRO 44, 2011
부록 : Page Coloring • Physically indexed caches are divided into multiple regions (colors). • All cache lines in a physical page are cached in one of those regions (colors). Physically indexed cache Virtual address virtual page number page offset OS control Address translation … … Physical address physical page number Page offset OS can control the page color of a virtual page through address mapping (by selecting a physical page with a specific value in its page color bits). = Cache address Cache tag Set index Block offset page color bits
부록 : Page Coloring Physical pages are grouped Physically indexed cache 1 2 3 4 … … …… i i+1 i+2 … … …… Process 1 … … ... 1 2 3 4 … … …… i i+1 i+2 … … …… Process 2