230 likes | 232 Views
This presentation provides an overview of user-level distributed shared memory (DSM), its characteristics, and its implementation. It discusses related work, system design, implementation details, fault handling, memory consistency model, and evaluation results. Future work and potential improvements are also discussed.
E N D
CSCS: A Concise Implementation of User-Level Distributed Shared Memory Final Presentation Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Dec. 11, 2009
DSM Overview DSM Characteristics: • Physically: distributed memory • Logically: a single shared address space Figure 1 DSM architecture
Related Work Models and Main Features: • IVY (Yale) - Divided Space: Shared & Private space • Mirage (UCLA) - Time Interval d : Avoid page thrashing • TreadMarks (Rice) - Lazy Release Consistency : Improve efficiency • SAM (Stanford)
System Design Figure 2 Server/Client mode
System Design • Server • Holder of metadata only • Thread-based Connection • Event-based Service
System Design Figure 3 Server Process/Threads
System Design • Client • Physical memory owner • UI/Work/Page Fetch Thread • Fixed-home Protocol • Not Aware of Peer Clients
System Design Figure 4 Client process/thread
System Design Figure 5 Sample Operation
Implementation • Message Passing: TCP socket Figure 6 Message Passing
Implementation • Server/Client Page Table • Server holds most up-to-date meta data • Server managers whole virtual memory space • Server records id & addresses of all nodes • Client owns the most up-to-date local memory segment • Client caches referenced pages from peer nodes
Figure 7 Connection Table Figure 8 Server Page Table
Implementation Figure 9 Client Page Table
Implementation • Page fault handler • Client Server • Check the access right • Fetch the page owner id/address • Update global access bits • Client Client • Connect to the page owner • Cache the referenced page • Update local access bits
Implementation • Page fault handler • Page fault type • Read remote page • Write on a page • Assumption • Reading happens more often than writing • Writing needs most-to-date copy more than reading
Implementation Truly a remote reading fault? dsm call: dsm_do_wrt_page () dsm call: dsm_do_no_page () Assume reading remote page NO: double page fault YES: continue Figure 10 Page fault handler wordflow
Implementation • Memory Consistency Model • Assumption Revisit • Reading happens more often than writing • Writing needs most-to-date copy more than reading • Multi-Reader/Single Writer • Snap-shot for reading • Every writing triggers page fault • Locks on pages being referenced • Semaphore-like reference counts: If ref_count > 0 Waiting/Re-random
DSM Evaluation Figure 11 Parallel Computation on ASP Problem
DSM Evaluation Figure 12 Execution time comparison
DSM Evaluation Figure 13 Message Transmission Comparison
DSM Evaluation Figure 14 Network Traffic Comparison
Future Work • Enhance system robustness • Evaluate scalability boundary • Provide better programmability
Thank You! Q&A