1 / 14

Reactive NUMA

Reactive NUMA. A Design for Unifying S-COMA and CC-NUMA Babak Falsafi and David A. Wood University of Wisconsin. Some Terminology. NUMA Non Uniform Memory Access CC-NUMA Cache Coherent NUMA COMA Cache Only Memory Architecture S-COMA Simple COMA. SMP Clusters.

gerda
Download Presentation

Reactive NUMA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reactive NUMA A Design for Unifying S-COMA and CC-NUMA Babak Falsafi and David A. Wood University of Wisconsin

  2. Some Terminology • NUMA • Non Uniform Memory Access • CC-NUMA • Cache Coherent NUMA • COMA • Cache Only Memory Architecture • S-COMA • Simple COMA

  3. SMP Clusters • Approach for large-scale shared memory parallel machines • Directory based cache coherence • RAD responsible for remote memory access

  4. CC-NUMA • First processor causes page fault • OS Maps Virtual Address to Global Physical address • RAD snoops memory bus • Block Cache • Remote request

  5. CC-NUMA • References global addresses directly • Remote cluster cache • Only holds remote data • Another level in cache hierarchy • Block cache is small • Sensitive to data allocation and placement • Good for scientific workloads

  6. S-COMA • First access causes page fault • OS initializes page table, RAD translation table and access control tags • Hits serviced by local memory • Misses detected by RAD • Inhibit memory • Request data

  7. S-COMA • Remote data in memory or cache • Allocated/Mapped at page granularity • S-COMA • OS handles allocation and migration • Large memory and cache • Fully associative • Large page size • Requires large granularity spatial locality • Possible Thrashing

  8. R-NUMA • Combine S-COMA and CC-NUMA • Map CC-NUMA pages to Global PA • Map S-COMA pages to Local PA • Often requires no additional hardware • Distinguish 2 types of pages • Reuse pages Data used frequently on the same node • Communication pages Data exchange between nodes

  9. Switching Mechanism • Reuse pages • Capacity and Conflict Misses • S-COMA • Communication pages • Coherence Misses • CC-NUMA • Detect refetches of evicted blocks • Trivial for read-only blocks in non-notifying protocol (still shared) • Additional hardware required for read-write-blocks • Count refetches on per-node, per-page basis

  10. R-NUMA Figure

  11. Qualitative Performance • Analysis of worst case behavior • Performance depends on S-COMA resp. CC-NUMA overhead • Realistically R-NUMA no more than 3 times worse than vanilla CC-NUMA or S-COME • In practice “bound” is much smaller

  12. Quantitative Results

  13. Conclusions • Dynamically react to program behavior • Exploit best caching strategy • Per Page basis • Worst case performance is bound • Quantitative Results indicate • R-NUMA usually no worse than best of CC-NUMA and S-COMA • If worse, still way better than worst case • Never worse than both • Less sensitive to relocation threshold or overhead than S-COMA • Less sensitive to cache size than CC-NUMA

  14. Questions • Sounds like a free lunch • Does R-NUMA really require no additional hardware? • Dynamically switching always good in research papers • What about the practice?

More Related