170 likes | 340 Views
TreadMarks. Sebastian Niezgoda. What is it?. Distributed Shared Memory system Rice University project. DSM. Many nodes in a cluster Each has limited private memory A shared memory available (large). TreadMarks DSM. Supports parallel computing
E N D
TreadMarks Sebastian Niezgoda Sebastian Niezgoda
What is it? • Distributed Shared Memory system • Rice University project
DSM • Many nodes in a cluster • Each has limited private memory • A shared memory available (large)
TreadMarks DSM • Supports parallel computing • Provides a shared address space for all machines in a cluster • Globally shared memory but execution in private memory
Why use DSM? • DSM provides an interface to the globally shared memory • Every processor can access every data item without the programmer needing to know/worry how to do it • How does MPI accomplish the “shared memory” access?
TreadMarks memory model • Memory model defines how updates to shared memory are reflected to the processes in the system • TreadMarks uses release consistency
TreadMarks release consistency • Synchronization is used to prevent race conditions between for parallel processes • Variable var is replicated to all processors • Only next processor that gets the lock can access var and only that processor knows about the change to var • Less message traffic
TreadMarks release consistency • Modification info is piggybacked to the lock grant message • Invalidate protocol used • When acquired, modified pages are invalidated • Later access causes a miss and pages are re-acquired
TreadMarks multiple-writer • Many processes can write to a page at the same time • Page is copied when written to processor’s private memory • Word-by-word comparison • Diffs are used to update the original • Benefits?
TreadMarks API • Provides process creation, destruction, synchronization, shared memory allocation facilities • A user-level library in Unix – no kernel mods needed (comm and mem mgmt already provided) • Berkeley sockets for messaging • Every message is either a request or response
Performance • Min round-trip time for smallest message 500 microseconds • Send 80 microseconds • Receive 80 microseconds • 180 microseconds for wire time, interrupts
Performance cont. • Time to remotely acquire free lock 827 microseconds • 1149 microseconds if the manager did not have control of the block last • 2792 microseconds to obtain a 4096 byte page from another processor (a remote access miss)
Performance cont. • Time to make twin: 167 microseconds (4kb pages) • Diff creation • 430 microseconds if page is unchanged • 472 microseconds if page changed • 686 microseconds worst case (every other word is changed)
Applications • Mixed Integer Programming • Genetic Linkage Analysis
References • “TreadMarks: Shared Memory Computing on Networks of Workstations” by Amza, Cox, Dwarkadas, Keleher, Lu, Rajamony, Yu, and Zwaenepoel. Rice University, Dept. of Computer Science • http://www.cs.rice.edu/~willy/TreadMarks/overview.html • http://en.wikipedia.org/wiki/Distributed_shared_memory