Design Issues of a Cooperative Cache with no Coherence Problems

Design Issues of a Cooperative Cache with no Coherence Problems Toni Cortes Sergi Girona Jesús Labarta

Cache Coherency Problem • Cache Coherency • To ensure that each cache has a copy of the newest version when it needs it. • The cache coherency problem arises when replication is allowed and the system maintains several copies of the same data.

Goals • To find a mechanism to allow good cooperation and still keep the data coherent. • Balance the load between the nodes to make the cooperative cache efficient. • Fault tolerance • If a single node fails, the whole system does not fail.

Assumptions & Terms • Each node runs a micro-kernel based operating system instead of a monolithic one. • Functions not offered by the kernel are implemented by out-of-kernel servers. • Each node has its own memory. • Fast memory copy – nodes can copy data between their memories using memory_copy. • Remote copy – data copied from one node to another. • Local copy – data copied within the same node.

Terms • Hits • Local hit • Requested file block found in the same node. • Remote hit • Requested file block found in another node. • Global hit • Requested file block is found in both the requesting node and remote nodes. • Misses • Miss on dirty • The file block that will be replaced has been modified but not written to the disk. • Miss on clean • The file block that will be replaced does not need to be written to the disk.

PAFS • Cache-servers • Serve client requests. • Disk-servers • Read and write data to disks. • Repartition-servers • Reassign buffers among the cache servers. • Parity-servers • Provide fault tolerance similar to RAID-5.

PAFS Overview

Cache Servers • The global cache is distributed among the cache servers. • Each cache-server is in charge of a set of files and buffers. • When a new block enters the cache, the cache-server in charge of the new block must discard one of its block first. • To avoid communication between servers, the Pseudo-Global Least Recently Used (PG-LRU) algorithm is used to select which block to discard.

Pseudo Global LRU • LRU – Least Recently Used • Picks the block that has not been used for the longest time. • PG-LRU – Pseudo Global Least Recently Used • Queue-tip = a set of buffers made up of the top 5% least recently used buffers of all nodes. • If there is a buffer in the queue-tip that is in the same node as the client, then that buffer is chosen. • Otherwise the least recently used buffer is chosen regardless of what node it is located.

Repartition Server • Distributes the buffers among the cache servers in order to adapt to the changing needs of the system. • Periodically asks each server for its working set. • Working-set = Number of different file blocks that have been accessed during a redistribution interval. • Using the working-set and the current size of the cache-server partition, the buffers are redistributed proportionally.

Repartition Policies • Not-Limited • Only the number of buffers that may be lost are limited. • Maximum that can be lost = 10% of their current blocks. • Buffers assigned eagerly – before they are needed. • Limited • Both the number of buffers that may be gained and lost are limited. • Maximum that can be gained = number of blocks that can be read from disk. • Buffers assigned eagerly – before they are needed. • Lazy-and-Limited • Both the number of buffers that may be gained and lost are limited. • Buffers assigned lazily - when they are needed. • Fixed Partition • The buffers are assigned at boot time.

Levels of Fault Tolerance • Level A • Not fault tolerant. • Level B • Only blocks in a remote cache are guaranteed to be in the disk. • Level C • All modified blocks are sent to the disk or parity-server. • Uses parity-servers to store the parity of the dirty blocks.

PAFS vs xFS • xFS • Encourages local hits by placing new blocks in the client’s local cache and making a local copy of blocks found in a remote cache. • PAFS • Does not try to increase local hits, tries to speed up remote hits. • Avoids cache coherency problem by not replicating the cache entries.

Simulator and Trace Files • Simulator • DIMEAS • Reproduces the behavior of a distributed-memory parallel machine. • Trace File • CHARISMA • Characterizes the behavior of a parallel machine. • SPRITE • Workload that may be found in a NOW system.

Network Bandwidth Influence

Cache Size Influence

Conclusions • Achieving high local hit ratio is not the only way to design high performance cooperative caches. • Taking most of the overhead away from remote hits is also a good way to achieve high-performance cooperative caches. • Avoiding the cache coherence by avoiding replication increases the system performance as compared to xFS. • The Lazy-and-Limited redistribution policy obtains the best results.

¿Questions ? • Name and describe the four servers of PAFS. • What is cache coherency and how does PAFS avoid the cache coherency problem?

Design Issues of a Cooperative Cache with no Coherence Problems

Design Issues of a Cooperative Cache with no Coherence Problems

Presentation Transcript

Extra Cache Coherence Examples

Directory-Based Cache Coherence

Verification of cache-coherence protocols with TLA+

Cache Coherence “Can we do a better job of supporting cache coherence ?”

Cache coherence

Cache Coherence

Cache coherence for CMPs

The Cache-Coherence Problem

Design Issues of a Cooperative Cache with no Coherence Problems

Cache Coherence

Cache coherence, etc… - MIMD –

Cache Coherence Protocols

The Cache-Coherence Problem

Power Efficient Cache Coherence

Directory-based Cache Coherence

Cache Coherence

The Cache-Coherence Problem

The Cache-Coherence Problem

Example Cache Coherence Problem

Verification of cache-coherence protocols with TLA+

The Cache-Coherence Problem