350 likes | 598 Views
Ti med Quorum Systems … for large-scale and dynamic environments. Vincent Gramoli , Michel Raynal. Context. Large-scale dynamic distributed systems. Context. Large-scale dynamic distributed systems Nodes communicate through message-passing. Context.
E N D
Timed Quorum Systems …for large-scale and dynamic environments Vincent Gramoli, Michel Raynal
Context Large-scale dynamic distributed systems Gramoli, Raynal
Context Large-scale dynamic distributed systems • Nodes communicate through message-passing Gramoli, Raynal
Context Large-scale dynamic distributed systems • Nodes communicate through message-passing • Nodes join/leave the system at any time Gramoli, Raynal
Goal To emulate a shared-memory in this context write read Gramoli, Raynal
Goal To emulate a shared-memory in this context Providing atomic(i.e. linearizable) read/write operations write read Gramoli, Raynal
Roadmap • Model and preliminary definitions • Related work • Timed Quorum System (TQS) • An efficient implementation of TQS • Conclusion Gramoli, Raynal
Simple model • System of n interconnected nodes with unique IDs • Asynchronous communication with neighbors (nodes whose ID is known) • Dynamism intensity (i.e. churn) c • We consider a single object (local atomicity) Gramoli, Raynal
Quorum System • Quorums are sets (of nodes) that mutually intersect. • A Quorum System(QS) is a set of quorums. Q1 ∩ Q2 ≠ Ø Q1 ∩ Q3 ≠ Ø Q2 ∩ Q3 ≠ Ø Q1 Q2 Q3 Ex. 3 quorums of size q=2 Gramoli, Raynal
Operations • Atomic quorum-based operations for static settings: [Attiya, Bar-Noy, Dolev, JACM 1996] • Each node of the quorums maintains: • A local value v of the object • A unique tag t, the version number of this value Gramoli, Raynal
Operations 1) Reading a value Q1 Q2 Q3 value? tag? v1,t1 Phase 1: Consult the most up-to-date value v Gramoli, Raynal
Operations 1) Reading a value Q1 Q2 Q3 v1,t1 Phase 1: Consult the most up-to-date value v Phase 2: Propagate the consulted value Gramoli, Raynal
Operations 1) Reading a value Q1 Q2 Theorem of Attiya and Welch 1998: « Read must write » to prevent new/old inversions for unbounded # of readers. Q3 v1,t1 Phase 1: Consult the most up-to-date value v Phase 2: Propagate the consulted value Gramoli, Raynal
Operations 1) Reading a value Q1 Q2 Q3 Output: v1 Phase 1: Consult the most up-to-date value v Phase 2: Propagate the consulted value Gramoli, Raynal
Operations 2) Writing a value v2 Input: v2 Q1 Q2 Q3 Gramoli, Raynal
Operations 2) Writing a value v2 max tag? t1 Q1 Q2 Q3 Phase 1: Consult the value version and choose a new one strictly larger Gramoli, Raynal
Operations 2) Writing a value v2 Q1 Q2 v2,t2 (with t2 > t1) Q3 Phase 1: Consult the value version and choose a new one strictly larger Phase 2: Propagate the new value associated with the new version Gramoli, Raynal
Dynamic Solutions • Reconfigurable storage: a failing QS is replaced by a new one. • RAMBO: Shvartsman, Lynch, DISC’02 • RDS: Chockler et al. OPODIS’05 • Structured dynamic quorums: failed servers are replaced by new ones. • AM05: Abraham, Malkhi, Dist. Comp.2005 • NN05: Nadav, Naor, DISC’05 • SQUARE: Gramoli, Anceaume, Virgillito, SAC’07 Gramoli, Raynal
Dynamic Solutions • Reconfigurable storage: a failing QS is replaced by new one. • RAMBO: Shvartsman, Lynch, DISC’02 • RDS: Chockler et al. OPODIS’05 • Structured dynamic quorums: failed servers are replaced by new ones. • AM05: Abraham, Malkhi, Dist. Comp.2005 • NN05: Nadav, Naor, DISC’05 • SQUARE: Gramoli, Anceaume, Virgillito, SAC’07 All solutions require bounded churn during any finite period Gramoli, Raynal
Dynamic Solutions Reconfiguration complexity vs. operation latency tradeoff RAMBO RDS AM05 NN05 SQUARE operation latency reconfiguration complexity Prevents scalability! Gramoli, Raynal
Timed Quorum System Dynamic quorum systems should be: • Probabilistic: # of failures not necessarily bounded • Timed: no property can hold forever Gramoli, Raynal
Timed Quorum System • Timed access strategyω: A mapping from any time t to a probability distribution on the possible quorums. • Δ-Timed Quorum System (TQS): For any Q1 and Q2 accessed resp. with ω(t1) and ω(t2), if |t2 – t1| ≤Δ, then Q1 Q2≠ Ø with high probability. Gramoli, Raynal
Timed Quorum System • Δ-Timed Quorum System (TQS): For any Q(t1) and Q(t2) accessed resp. with ω(t1) and ω(t2): if |t2 – t1| ≤Δ, then Q(t1) Q(t2)≠ Ø with high probability. Q(t2) Q(t5) Q(t1) Q(t3) Q(t4) Time Q(t1)Q(t2) Q(t2)Q(t3) Q(t3)Q(t4) Q(t3)Q(t5) Δ Example of a TQS: {Q(t1),Q(t2),Q(t3),Q(t4),Q(t5)} Gramoli, Raynal
Consistency • Probabilistic Atomicity: • In the real-time sequence of operations: • Any operation verifies atomicity w.r.t. all preceding successful operations, and it is said successful • Or this operation is said unsuccessful • Any operation is successful with high probability Gramoli, Raynal
Consistency • Probabilistic Atomicity: • In the real-time sequence of operations: • Any operation verifies atomicity w.r.t. all preceding successful operations, and it is said successful • Or this operation is said unsuccessful • Any operation is successful with high probability Theorem 1: If at least one quorum is accessed every Δ period of time, then Δ-TQS implements probabilistic atomicity. Gramoli, Raynal
Some observations • Replication is necessary for data persistence • In large-scale systems, operations are frequent • Theorem « read must write » of Attiya and Welch indicates that some information must be replicated in any operation Gramoli, Raynal
Efficient TQS Implementation • Underlying gossip-based shuffle of neighborhood: • Each node has constantly a new random set of neighbors • Classical quorum-based operations: • Consulting v and t at some quorum • Choosing v’ and t’ to propagate • Propagating v’ and t’ to some quorum Gramoli, Raynal
Efficient TQS Implementation Disseminate until q = O(n) nodes are contacted Client 1 k l k k 1 1 k 1 Gramoli, Raynal
Efficient TQS Implementation Assumptions: • neighbors are chosen uniformly at random • at least one operation succeeds every Δ time • c = rate of arrival = rate of departure [0,1) Results: • This algorithm implements a TQS Gramoli, Raynal
Efficient TQS Implementation Assumptions: • neighbors are chosen uniformly at random • at least one operation succeeds every Δ time • c = rate of arrival = rate of departure [0,1) Results: • This algorithm implements a TQS • Replication is piggybacked into operations Gramoli, Raynal
Efficient TQS Implementation Assumptions: • neighbors are chosen uniformly at random • at least one operation succeeds every Δ time • c = rate of arrival = rate of departure [0,1) Results: • This algorithm implements a TQS • Replication is piggybacked into operations • The quorum size is O(nD) where D = (1-c)-Δ Gramoli, Raynal
Efficient TQS Implementation Assumptions: • neighbors are chosen uniformly at random • at least one operation succeeds every Δ time • c = rate of arrival = rate of departure [0,1) Results: • This algorithm implements a TQS • Replication is piggybacked into operations • The quorum size is O(nD) where D = (1-c)-Δ • The operation latency is O(logknD) message delays, where D = (1-c)-Δ Gramoli, Raynal
Efficient TQS Implementation Assumptions: • neighbors are chosen uniformly at random • at least one operation succeeds every Δ time • c = rate of arrival = rate of departure [0,1) Results: • This algorithm implements a TQS • Replication is piggybacked into operations • The quorum size is O(nD) where D = (1-c)-Δ • The operation latency is O(logknD) message delays, where D = (1-c)-Δ • Smallest quorum size O(n) for static systems when D=O(1) cf. [Malkhi, Reiter, Wool, Wright, Inf. and Comp. Journal 2001] Gramoli, Raynal
Conclusion We defined Timed Quorum System that: • Is inherently dynamic: • NO underlying structure • Timely intersection requirement • Ensures Probabilistic Atomicity • Scales well: • O(nD) messages by operation • O(logk nD) time by operation • Is optimal: • When D=O(1), translates into best known static result: O(n) Gramoli, Raynal
Open Issue • TQS in Mobile Sensor Networks: • Consultation phase: • Gather motes to consult t and v • Scatter motes to make t and v likely visible • Propagation phase: • Gather motes to propagate t’ and v’ • Scatter motes to make t’ and v’ likely visible Gramoli, Raynal