650 likes | 1.05k Views
Distributed Shared Memory for Large-Scale Dynamic Systems. Vincent Gramoli supervised by Michel Raynal. My Thesis. Implementing a distributed shared memory for large-scale dynamic systems. My Thesis. Implementing a distributed shared memory for large-scale dynamic systems is
E N D
Distributed Shared Memoryfor Large-Scale Dynamic Systems Vincent Gramoli supervised by Michel Raynal Vincent Gramoli
My Thesis Implementing a distributed shared memory for large-scale dynamic systems Vincent Gramoli
My Thesis Implementing a distributed shared memory for large-scale dynamic systems is NECESSARY, Vincent Gramoli
My Thesis Implementing a distributed shared memory for large-scale dynamic systems is NECESSARY, DIFFICULT, Vincent Gramoli
My Thesis Implementing a distributed shared memory for large-scale dynamic systems is NECESSARY, DIFFICULT, DOABLE! Vincent Gramoli
RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli
RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli
Distributed Systems Enlarge • Internet explosion IPv4 -> IPv6 • Multiplication of personal devices • 17 billions of network devices by 2012 (IDC prediction) Internet Vincent Gramoli
Distributed Systems are Dynamic Independent computational entities act asynchronously, and are affected by unpredictable events (join/leaving). These sporadic activities make the system dynamic Vincent Gramoli
Massively Accessed Applications WebServices use large information • eBay: Auctioning service • Wikipedia: Collaborative encyclopedia • LastMinute: Booking application …but require too muchpower supply and cost too much increase (auction) modify (article) reserve (tickets) Vincent Gramoli
Massively Distributed Applications Peer-to-Peer applications share resources • BitTorrent: File Sharing • Skype: Voice over IP • Joost: Video Streaming …but prevent large-scale collaboration. copy exchange create Vincent Gramoli
Filling the Gap is Necessary Providing distributed applications where entities (nodes) can fully collaborate • P2Pedia: using P2P to built a collaborative encyclopedia • P2P eBay: using P2P as an auctioning service Vincent Gramoli
There are 2 Ways of Colaborating • Using a Shared Memory • A node writes information in the memory • Another node reads information from the memory • Using Message Passing • A node sends a message to another node • The second node receives the message from the other Memory Read v Write v Node 1 Node 2 Node 3 Node 1 Send v Recv v Node 2 Node 3 Vincent Gramoli
Shared Memory is Easier to Use • Shared Memory is easy to use • If information is written, collaboration progresses! • Message Passing is difficult to use • To which node the information should be sent? Vincent Gramoli
Message Passing Tolerates Failures • Shared Memory is failure-prone • Communication relies on memory availability • Message-Passing is fault-tolerant • As long as there is a way to route a message Memory Read v Write v Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 Send v Recv v Vincent Gramoli
The Best of the 2 Ways • Distributed Shared Memory (DSM) • emulates a Shared Memory to provide simplicity, • in the Message Passingmodel to tolerate failures. DSM read / write(v) operations read-ack(v) / write-ack Vincent Gramoli
RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli
Our DSM Consistency:Atomicity Atomicity (Linearizability) defines an operationordering: • If an operation ends before another starts, then it can not be ordered after • Write operations are totally ordered and read operations are ordered with respect to write operations • A read returns the last value written (or the default one if none exist) Vincent Gramoli
Quorum-based DSM Sharing memory robustly in message-passing systems H. Attiya, A. Bar-Noy, D. Dolev, JACM1995 • Quorums: mutually intersecting sets of nodes Ex. 3 quorums of size q=2, with memory size m=3 Q1 ∩ Q2 ≠ Ø Q1 ∩ Q3 ≠ Ø Q2 ∩ Q3 ≠ Ø Q1 Q2 Q3 • Each node of the quorums maintains: • A local value v of the object • A unique tag t, the version number of this value Vincent Gramoli
Quorum-based DSM • Read and write operations • A node ireads the object value vk by • Asking vj and tj to each node j of a quorum • Choosing the value vk with the largest tag tk • Replicating vk and tk to all nodes of a quorum • A node iwrites a new object value vn by • Asking tj to each node j of a quorum • Choosing a larger tn than any tj returned • Replicating vn and tnto all nodes of a quorum Get <vk,tk> Set <vk,tk> Get <vk,tk> tn = tk++ Set <vn,tn> Vincent Gramoli
Quorum-based DSM • Reading a value Q1 Q2 Q3 value? tag? v1,t1 Vincent Gramoli
Quorum-based DSM • Reading a value Q1 Q2 Q3 v1,t1 Vincent Gramoli
Quorum-based DSM • Reading a value Q1 Q2 Q3 Output: v1 Vincent Gramoli
Quorum-based DSM • Writing a value v2 Input: v2 Q1 Q2 Q3 Vincent Gramoli
Quorum-based DSM • Writing a value v2 max tag? t1 Q1 Q2 Q3 Vincent Gramoli
Quorum-based DSM • Writing a value v2 Q1 Q2 v2,t2 (with t2 > t1) Q3 Vincent Gramoli
Quorum-based DSM • Works well in static system • Number of failures f must be f ≤ m - q Q1 ∩ Q2 ≠ Ø Q2 ∩ Q3 ≠ Ø Q1 Q2 Q3 • All operations can access a quorum Vincent Gramoli
Quorum-based DSM • Does not work in dynamic systems • All quorums may fail if failures are unbounded Problem: Q1 ∩ Q2 = Ø and Q1 ∩ Q3 = Ø and Q2 ∩ Q3 = Ø Q1 Q2 Q3 Vincent Gramoli
RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli
Reconfiguring • Dynamism produces unbounded number of failures • Solution: Reconfiguration • Replacing the quorum configuration periodically Problem: Q1 ∩ Q2 = Ø and Q1 ∩ Q3 = Ø and Q2 ∩ Q3 = Ø Q1 Q2 Q3 Vincent Gramoli
Agreeing on the Configuration • All must agree on the next configuration • Quorum-based consensus algorithm: Paxos • Before, a consensus block complemented the DSM service: • Paxos, 3-phase leader-based algorithm • Prepare a ballot (2 message delays) • Propose a configuration to install (2 message delays) • Propagate the decided configuration (1 message delay) RAMBO: Reconfigurable Atomic Memory Service for Dynamic Networks N. Lynch, A. Shvartsman, DISC 2002 Vincent Gramoli
RDS: Reconfigurable Distributed Storage • RDS integrates consensus service into the reconfigurable DSM • Fast version of Paxos: • Remove the first phase (in some cases) • Quorums also propagate configuration • Ensuring Read/Write Atomicity: • Piggyback object information into Paxos messages • Parallelizing Obsolete ConfigurationRemoval: • Add an additional message to the propagate phase of Paxos Vincent Gramoli
Contributions • Operations are fast (sometimes optimal) • 1 to 2 message delays • Reconfiguration is fast (fault-tolerance) • 3 to 5 message delays • While: • Operation atomicity and • Operation independence are preserved Vincent Gramoli
Facing Dynamism Reconfigurable Distributed Storage G. Chockler, S. Gilbert, V. Gramoli, P. Musial, A. Shvartsman Proceedings of OPODIS 2005 Vincent Gramoli
RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli
Facing Scalability is Difficult • Problems: • Large-scale participation induces load • When load is too high, requests can be lost • Bandwidth resources are limited Goal: Tolerate load by preventing communication overhead • Solution: A DSM that adapts to load variations and that restrictscommunication Vincent Gramoli
Using Logical Overlay Object replicas r1, …, rk share a 2-dim coordinate space rk Vincent Gramoli
Benefiting from Locality Each replica ri can communicate only with its nearest neighbors ri Vincent Gramoli
Reparing the Overlay Topology takeover mechanism If a node ri fails, a takeover node rj replaces it rj ri A Scalable Content-Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker SIGCOMM2001 Vincent Gramoli
Dynamic Bi-Quorums Bi-Quorums: • Quorums of two types where not all quorums intersect • Quorums of different types intersect • Vertical Quorum: All replicas responsible of an abscissa x • Horizontal Quorum: All replicas responsible of an ordinate y x For any horizontal quorum H and any vertical quorum V: H V ≠Ø y Vincent Gramoli
Operation Execution • Read Operation: • Get up-to-date value and largest tag on a horizontal quorum, • 2) Propagate this value and tag on a vertical quorum. • Write Operation: • Get up-to-date value and largest tag on a horizontal quorum, • 2) Propagate the value to write (and a higher tag) twice on the same vertical quorum Vincent Gramoli
Load Adaptation Thwart: requests follow the diagonal until a non-overloaded node is found. Expansion: A node is added to the memory if no non-overloaded node is found. Shrink: if underloaded, a node leaves the memory after having notified its neighbors. Vincent Gramoli
Contributions SQUARE is a DSM that: • Scales well by tolerating load variations • Defines load-optimal quorums (under reasonable assumption) • Uses communication efficient reconfiguration Vincent Gramoli
Operation Latency Bad News: The operation latency increases with the load (request rate) Vincent Gramoli
Facing Scalability is Difficult P2P Architecture for Self-* Atomic Memory E. Anceaume, M. Gradinariu, V. Gramoli, A. Virgillito Proceedings of ISPAN 2005 SQUARE: Scalable Quorum-Based Atomic Memory with Local Reconfiguration V. Gramoli, E. Anceaume, A. Virgillito Proceedings of ACM SAC 2007 Vincent Gramoli
RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli
Probability for modeling Reality Motivations for Probabilistic Solutions: • Tradeoff prevents deterministic solutions efficiency • Allowing more Realistic Models • Any node can fail independently • Even if it is unlikely that many nodes fail at the same time Vincent Gramoli
What is Churn? Churn is the dynamism intensity! Dynamic System: • n interconnected nodes • Nodes join/leave the system • A joining node is new • Here, we model the churn simply as c: • At each time unit, cn nodes leave the network • At each time unit, cn nodes enter the network Vincent Gramoli
Relaxing Consistency Every operation verifies all atomicity rules with high probability! Unsuccessful operation: operation that violate at east one of those rules Probabilistic Atomicity: • If an operation Op1 ends before another Op2 starts, then it is ordered after with probability ε =e-β2 (with β a constant) (If this happen, operation Op2 is considered as unsuccessful) • Write operations are totally ordered and read operations are ordered w.r.t. write operations • A read returns the last successfully value written (or the default one if none exist) with probability 1- e-β2 (with β a constant)(If this does not hold, then the read is unsuccessful) Vincent Gramoli
TQS: Timed Quorum System • Intersection is provided during a bounded period of timewith high probability • Gossip-based algorithm in parallel • Shuffle set of neighbors using gossip-based algorithm • Traditional read/write operations using two message round-trip between the client and a quorum • Consult value and tag from a quorum • Create new largertag (if write) • Propagate value and tag to a quorum Vincent Gramoli