Zhongkai Chen 3/25/2010

Network Victim Cache: Leveraging Network-on-Chip for Managing Shared Caches in Chip Multiprocessors Zhongkai Chen 3/25/2010

Paper Information Jinglei Wang; YiboXue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper appears in: Embedded and Multimedia Computing, 2009. EM-Com 2009. 4th International Publication Date : 10-12 Dec. 2009

Outline • Introduction • Problems • Network on Chip (NoC) • Victim Cache • Network Victim Cache Design • Baseline Architecture • NVC Scheme • Performance Evaluation

Introduction • The large working sets of commercial and scientific workloads favor a shared L2 cache design that maximizes the aggregate cache capacity and minimizes off-chip memory requests in Chip Multiprocessors (CMP) • Two Important hurdles that restrict the scalability of these chip multiprocessors: • the on-chip memory cost of directory • the long L1 miss latencies

Introduction Network on Chip (NoC) In a NoC system, modules such as processor cores, memories and specialized IP blocks exchange data using a network as a "public transportation" sub-system for the information traffic. An NoC is constructed from multiple point-to-point data links interconnected by routers, such that messages can be relayed from any source module to any destination module over several links, by making routing decisions at the routers.

Introduction Victim Cache A victim cache is a cache used to hold blocks evicted from a CPU cache upon replacement. The victim cache lies between the main cache and its refill path. The victim cache is usually fully associative, and is intended to reduce the number of conflict misses. Only a small fraction of the memory accesses of the program require high associativity. The victim cache exploits this property by providing high associativity to only these accesses.

Network Victim Cache Design L1 caches are kept coherent by using directory-based cache coherence protocol. • Baseline Architecture The tile CMP is organized as 2D array of replicated tiles each with a core, a private L1 cache, an L2 cache slice, and a router that connects the tile to the network on chip. The L2 cache slices form a logically shared L2. L1 cache misses are sent to the corresponding home tile, which looks up the directory information and performs the actions needed to ensure coherence. Directory

Network Victim Cache Design • Baseline Router Architecture In tiled CMP, L1 cache and L2 cache are attached to router through Network Interface Component (NIC). Routers are connected together by four direction interfaces to form a 2D network on chip.

NVC Scheme • The Network Victim Cache (NVC) The difference from the baseline router architecture is the modification of network interface component. VC and DC are added into the network interface component. Remove directory information from L2 caches and stored it in Directory Caches (DC) in the network interface components to save memory space The saved directory space is used as Victim Caches (VC) to capture and store evictions from local L1 caches to reduce subsequent L1 miss latencies.

NVC Scheme VC • At the home tile, the DC captures L1 miss request in the network interface component and looks up directory information of the requesting block. It fetches data block from local L2 cache and sends reply back to the requestor. • If a L1 cache line is evicted because of a conflict or capacity miss, we attempt to keep a copy of the victim line in the VC to reduce subsequent access latency to the same line. Evicted by a conflict or capacity miss Miss Request DC L2 Cache L1 Cache Fetched Data Block

NVC Scheme • All L1 misses will first check VC when they flow through the network interface component in case there’s a valid block. On a VC miss, the request continues to travel to the home tile. On a VC hit, the block is invalidated in the VC and moved into the L1 cache. Miss Request DC Miss L1 Cache VC … … Move Back Hit -> Invalidate

Performance Evaluation • Simulation Environment Use GEMS simulator to evaluate the performance of NVC against over the baseline CMP. The number of entries of VC is equal to that of L1 cache and the number of entries of DC is twice that of L1 cache. 8 workloads from SPLASH-2 and PARSEC benchmarks on Solaris 10 operating system Detailed system parameters

Performance Evaluation • Impact on L1 cache miss latency NVC decreases the L1 cache miss latencies by 21-49%, and by 31% on average. For water benchmark, small working set makes most of L1 misses can be satisfied in local victim cache, and then reduces the L1 miss latencies by 49%. Normalized L1 cache average miss latency

Performance Evaluation • Impact on execution time NVC reduces the execution time of each benchmark by 10-34%.The execution time of lu and water are reduced by 34%. For water benchmark, small working set makes most of L1 misses can be satisfied in local victim cache and leads to better performance. NVC improves performance of CMP by 23% on average. Execution time

Performance Evaluation • On-Chip Network Traffic Reduction An additional benefit of NVC is the reduction of on-chip coherence traffic. NVC reduces the number of coherence messages of each benchmark by 16-48%, and by 28% on average. NVC eliminates some inter-tile messages when accesses can be resolved in local victim caches.

Performance Evaluation • Scalability Compared to conventional shared L2 cache design, NVC increases on-chip storage by only 0.18%. As the number of cores increases, the saved directory storage from L2 cache will increase significantly, while the storage overhead of the proposed scheme will increase far slower. NVC can provide much better scalability than the conventional shared L2 cache design when the number of cores increases.

Thank you

Zhongkai Chen 3/25/2010

Zhongkai Chen 3/25/2010

Presentation Transcript

3 . 2010

3

Chen

Chen Li ( 李晨 )

Chen, Yeng -Long

Chen, Deng-Shun 3 Dec, 2013

Presenter: Yu-Chu Chen Advisor: Ming- Puu Chen Date: 2009/3/2

Presenter: Chen, Yu-Chu Adviser: Chen, Ming Puu Date: 2009/3/16

Presentation by Juli Kim, Alina Chen

Chih -Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE,

I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih-Cheng Hung 3

Jiangzhuo Chen

Chen Lizhu 1 , Shao Ming 2 , X.S. Chen 3 , Wu Yuanfang 1

8/3/2010

Mei-Chen Yeh 2010/02/23

Ting Chen, Qing Chen, Xu Zhang, Dong Wang, Li-jun Wan

ECS289A Presentation By Hua Chen 2003-3-3

Howard Chen - April 6, 2010

Lab for Internet and Security Technology Yan Chen

Chen Lizhu 1 , Shao Ming 2 , X.S. Chen 3 , Wu Yuanfang 1

I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih-Cheng Hung 3