200 likes | 300 Views
Regularities Considered Harmful: Forcing Randomness to Memory Accesses to Reduce Row Buffer Conflicts for Multi-Core, Multi-Bank Systems. ACM ASPLOS’13 Heekwon Park, Computer Science Department University of Pittsburgh . Embedded Lab. Park Yeongseong. Contents. Introduction Background
E N D
Regularities Considered Harmful: Forcing Randomness to Memory Accesses to Reduce Row Buffer Conflicts for Multi-Core, Multi-Bank Systems ACM ASPLOS’13 Heekwon Park, Computer Science Department University of Pittsburgh Embedded Lab. Park Yeongseong
Contents • Introduction • Background • Regularity Considered Harmful • Design and Implementation • Performance Evaluation • Conclusions • Q&A
Introduction • Recent computer architecture (Multi-Core) • A vast amount of main memory • Need to re-examine • internal policies, mechanisms • Rethinking the memory allocation issue
Background • Problem • Row buffer conflict • Approach • Memory container • Randomize memory access < Conceptual memory organization >
Background • Row-buffer Conflict • Precharging • Activating operation • Delay • Energy Consumption < Row-buffer hit and conflict overhead >
Background < Conflict does not occur > < Conflict occurs> • Kernel-level memory allocator • Mapping between virtual pages and physical page frames • Memory controller • Banks
Memory Organization Analysis • CPU cache mode • Uncacheable • Variables numerous times Access • two variables mutually dependent
Memory Organization Analysis • Figure (d) ranges from 0 to 2,000,000 (roughly 128MB size) • Figure (c) zooms in on the 590,000 ~ 640,000 portion of Figure (d) • Figure (b) zooms in on a portion of iterations of Figure (c) • Figure (a) zooms in on a portion of iterations of Figure (b) < Analysis result>
Regularity Considered Harmful • Modified Algorithm • Set the two variable : located in the same cache line • Different starting physical address • Average elapsed time • 2052μsec < Sequential access pattern >
Regularity Considered Harmful • Average elapsed time • 1925 μsec • “1/total number of banks”. < Random access pattern >
Design and Implementation < Memory container design > • The minimum memory unit of page frame
Design and Implementation < Comparison between buddy and randomized algorithm> • Individual page frame management • Downward search
Performance Evaluation • Experiment Environment • IBM x3650 M2 Server • Intel XEON x5570 quad core processors • 32GB DDR3 Memory • 450GB SAS Disk 8 • Linux kernel version 2.6.32
Performance Evaluation • Benchmark category • 1 Group : Memory intensive benchmark • Stream, Sysbench-memory, Ramspeed • 2 Group : CPU or I/O intensive benchmark • Kernel Compile, Dbench, Unixbench • 3 Group : To represent diverse application domains • PARSEC
Performance Evaluation < Memory intensive benchmark results > < CPU or I/O intensive benchmark results >
Performance Evaluation < PARSEC benchmark result >
Conclusions • : kernel-level memory allocator • Multi-core, Multi-bank systems • Dedicate multiple banks to a core • Maximize memory parallelism • Same bank Access reduce Memory container Randomizing memory allocation algorithm
References • http://people.cs.pitt.edu/~parkhk/publications.html • 멀티-코어 멀티-뱅크에서의 메모리 참조 패턴에 따른 성능 분석 – 학위논문(석사) 이 상엽