Reactive NUMA

Reactive NUMA A Design for Unifying S-COMA and CC-NUMA Babak Falsafi and David A. Wood University of Wisconsin

Some Terminology • NUMA • Non Uniform Memory Access • CC-NUMA • Cache Coherent NUMA • COMA • Cache Only Memory Architecture • S-COMA • Simple COMA

SMP Clusters • Approach for large-scale shared memory parallel machines • Directory based cache coherence • RAD responsible for remote memory access

CC-NUMA • First processor causes page fault • OS Maps Virtual Address to Global Physical address • RAD snoops memory bus • Block Cache • Remote request

CC-NUMA • References global addresses directly • Remote cluster cache • Only holds remote data • Another level in cache hierarchy • Block cache is small • Sensitive to data allocation and placement • Good for scientific workloads

S-COMA • First access causes page fault • OS initializes page table, RAD translation table and access control tags • Hits serviced by local memory • Misses detected by RAD • Inhibit memory • Request data

S-COMA • Remote data in memory or cache • Allocated/Mapped at page granularity • S-COMA • OS handles allocation and migration • Large memory and cache • Fully associative • Large page size • Requires large granularity spatial locality • Possible Thrashing

R-NUMA • Combine S-COMA and CC-NUMA • Map CC-NUMA pages to Global PA • Map S-COMA pages to Local PA • Often requires no additional hardware • Distinguish 2 types of pages • Reuse pages Data used frequently on the same node • Communication pages Data exchange between nodes

Switching Mechanism • Reuse pages • Capacity and Conflict Misses • S-COMA • Communication pages • Coherence Misses • CC-NUMA • Detect refetches of evicted blocks • Trivial for read-only blocks in non-notifying protocol (still shared) • Additional hardware required for read-write-blocks • Count refetches on per-node, per-page basis

R-NUMA Figure

Qualitative Performance • Analysis of worst case behavior • Performance depends on S-COMA resp. CC-NUMA overhead • Realistically R-NUMA no more than 3 times worse than vanilla CC-NUMA or S-COME • In practice “bound” is much smaller

Quantitative Results

Conclusions • Dynamically react to program behavior • Exploit best caching strategy • Per Page basis • Worst case performance is bound • Quantitative Results indicate • R-NUMA usually no worse than best of CC-NUMA and S-COMA • If worse, still way better than worst case • Never worse than both • Less sensitive to relocation threshold or overhead than S-COMA • Less sensitive to cache size than CC-NUMA

Questions • Sounds like a free lunch • Does R-NUMA really require no additional hardware? • Dynamically switching always good in research papers • What about the practice?

Reactive NUMA

Reactive NUMA

Presentation Transcript

Reactive  Proactive

Reactive Systems

Reactive Intermediates

Multiprocessing and NUMA

Reactive Extensions

Reactive Approach

Be reactive

REACTIVE CREATIVE

REACTIVE ARTHRITIS

Numa Pompiliuis

Reactive arthritis

Reactive

REACTIVE CHEMICALS

Reactive Dye

NUMA Parallel Machines

Reactive Arthritis

Reactive Power