100 likes | 209 Views
“ NAHALAL : Cache Organization for Chip Multiprocessors ” New LSU Policy. By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz. NAHALAL ARCHTECTURE NAHALAL architecture defines the memory cache banks of the L2 cache.
E N D
“NAHALAL : Cache Organization for Chip Multiprocessors”New LSU Policy By : Ido Shayevitz and Yoav Shargil Supervisor: Zvika Guz
NAHALAL ARCHTECTURE NAHALAL architecture defines the memory cache banks of the L2 cache. Each processor has a private backyard bank and all processors shared a small bank. The architecture is based on the hot shared line phenomenon.
X LSU LRU LSU Improvement • Placement Policy • Replacement Policy from Private Bank : LRU • Replacement Policy from Public Bank : NAHALAL LSU policy wisely select the Least Shared Used line to throw from the public bank.
LSU Implementation • Shift-register with N cells for each Line. • Each cell in the shift-register hold CPU num • In throwing by CPUi : For each shift-register do XOR between each cell and the ID of CPUi. The shift-register on which the XOR produce 0, will be the chosen one. If non produce 0 then do regular LRU. • In order ro reduce memory overhead, define N=4. Therefore 2 *4*3 = 0.1875MB 18.75% memory overhead. 14 Simple, short time algorithm in HW
Simulation Structure in Simics Using pyhton script we defined :
Writing Benchmarks Writing Benchmarks is done in the simulated target console :
Writing Benchmarks • Using Threads with pthread library • Each Thread is associated to a CPU using sched library. • Parallel code is written in the benchmark • Also OS code and pthread code cause to Parallel code. • Each benchmark we run first without LSU and second with LSU.
Collecting Statistics Cache statistics: l2c ----------------- Total number of transactions: 610349 Total memory stall time: 31402835 Total memory hit stall time: 28251635 Device data reads (DMA): 0 Device data writes (DMA): 0 Uncacheable data reads: 17 Uncacheable data writes: 30738 Uncacheable instruction fetches: 0 Data read transactions: 403488 Total read stall time: 17488735 Total read hit stall time: 14383135 Data read remote hits: 0 Data read misses: 10352 Data read hit ratio: 97.43% Instruction fetch transactions: 0 Instruction fetch misses: 0 Data write transactions: 176106 Total write stall time: 4687600 Total write hit stall time: 4687600 Data write remote hits: 0 Data write misses: 0 Data write hit ratio: 100.00% Copy back transactions: 0 Number of replacments in the middle (NAHALAL): 557
Results 1 2 4 3 1. Improvement of 54% in average stall time per transaction. 2.Improvement of 61% in average stall time per transaction. 3. 8.375% from the transactions cause a replacement in the middle without LSU, and with LSU only 0.09% ! Improvement of∆=8.28% 4. 8.75% from the transactions cause a replacement in the middle without LSU, and with LSU only 0.02% ! Improvement of∆=8.73%
Conclusions LSU policy significantly improve average stall time per transaction, Therefore : LSU Policy implemented in NAHALAL architecture significantly reduce number of cycles for a benchmark. LSU policy significantly reduce number of replacements in the middle, Therefore : LSU Policy implemented in NAHALAL architecture, better keep the hot shared lines in the public bank. According to our implementation, LRU is activated if LSU did not find a line, Therefore : LSU Policy as we implemented is always preferable then LRU.