1 / 9

Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA

Explore the integration of Cache Coherent NUMA (CC-NUMA) and Simple Cache Only Memory Architectures (S-COMA) into a hybrid system, maximizing performance. Study the advantages, disadvantages, and qualitative performance of the novel R-NUMA approach.

isabelallen
Download Presentation

Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reactive NUMA:A Design for Unifying S-COMA and CC-NUMA Babak Falsafi and David A. Wood Computer Science Department University of Wisconsin, Madison Presented by Anita Lungu February 17, 2006

  2. Context and Motivation • Large-Scale Distributed Shared Memory parallel machines • Directory coherence between SMPs • Local access fast / Remote access slow • Problem: • Hide remote memory access latency • Solutions: • Cache Coherent NUMA (CC-NUMA) • Best when: coherency misses dominate • Simple Cache Only Memory Architectures (S-COMA) • Best when: capacity misses dominate • Opportunity: • Hybrid: R-NUMA= CC-NUMA + S-COMA • Support both: dynamically select protocol for each page • Better performance than each separately =>Best of both worlds

  3. CC-NUMA • Remote cluster cache • Keeps only remote data • Block level granularity • Small & fast (SRAM) • Can be larger & slower (DRAM) • Data elements: home node allocated • Advantage when: • Remote working set fits in small block cache • Mostly coherence misses • Disadvantage when: • Many data accesses are remote

  4. S-COMA • Distributed main memory = 2nd level cache for remote data • Data elements: NO home node • Allocation and mapping • Page granularity (Software) • Standard Virtual Address Translation hardware • Coherence • Block granularity (Hardware) • Extra hardware: • Access control tags • 2 bits/block, trigger to inhibit memory • Auxiliary SRAM translation table • Convert Local Physical Pages<->Global Physical Pages (home) • Advantage when: • Mostly capacity/cold misses • Remote data is reused often

  5. R-NUMA • Classify remote pages: • Reuse: • accessed many times by a node • Communication: • Used to communicate data between nodes • Default all pages to CC-NUMA • Dynamically change page to S-COMA • Threshold: #remote capacity/conflict misses per page (in block cache) • Per node decision

  6. Qualitative Performance • Worst case scenario • Page relocated from block cache (CC-NUMA) to memory (S-COMA) and not referenced again • Worst case performance • Depends on cost of relocation (change page from CC-NUMA to S-COMA) relative to cost of page allocation • R-NUMA can be 3x worse than either CC-NUMA or S-COMA • But… • Threshold for optimal worst case performance <> threshold for optimal average performance

  7. Base System Results • Best case: • R-NUMA reduces execution time by 37% • Worst case: • R-NUMA increases execution time by 57% • CC-NUMA can be 179% worse than S-COMA • S-COMA can be 315% worse that CC-NUMA

  8. Sensitivity Results 2 1 1. S-COMA and R-NUMA sensitivity to page-fault and TLB invalidation overhead 2. R-NUMA sensitivity to relocation threshold value 3. CC-NUMA and R-NUMA sensitivity to cache size 3

  9. Questions?

More Related