330 likes | 488 Views
Reconfigurable Distributed Storage for Dynamic Networks. Gregory Chockler, Seth Gilbert, Vincent Gramoli , Peter M Musial, Alexander A Shvartsman. Goals. Reconfigurable Distributed Storage ( RDS ) Atomic consistency (read/write) Fault Tolerance …in Dynamic and Asynchronous Systems.
E N D
Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman
Goals Reconfigurable Distributed Storage (RDS) • Atomic consistency (read/write) • Fault Tolerance …in Dynamic and Asynchronous Systems.
Distributed Storage Data is replicated at several network locations
Distributed Storage Read Write Operation policy
leaving nodes joining nodes Distributed Storage in Dynamic Networks
Distributed Storage in Dynamic Networks …requires a reconfiguration process.
Distributed Storage in Dynamic Networks …by achieving agreement.
Model • Distributed • Connected set of processors • Each processor has a unique id i I • MWMR, any processor is a potential client • Asynchronous • Asynchronous processors • Point-to-point asynchronous unreliable channels • Dynamic • Processors join and leave the system • Processors may crash
What is a configuration? • Configuration <members, read-quorums, write-quorums> • members is a set of processors, • read-quorums, write-quorums two sets of quorums • RQ read-quorums,WQ write-quorums • RQ members • WQ members • RQ WQ (only for a given configuration) • Every client maintains a set of configurations, initially containing the default one.
Single Object Operations Overview After [ABD95] • tag= <c,i>N I, val a possible value • val = Read()i (<c,j>,val)=query();[prop(<c,j>,val);] • Write(val)i (<c’,j>,val’)=query();prop(<c’++,i>,val); • (tag,val) query(NULL): gathers (tag,val) pairs of all processors of a RQ and returns the one with the largesttag. • NULL prop(tag,val): updates (tag,val) pairs at all processors of a WQ. Read tag Write tag
Reconfiguration Design Goals • Sound • Totally ordered configurations • Flexible • No dependences between configurations • Non-intrusive • Makes possible concurrent read/write operations • Fast • Strengthening fault tolerance
Decoupling Reconfiguration • Reconfiguration = Replacing Configurations • {I} Installing a new configuration • {R} Removing old configuration(s) • If {R} ≺ {I} Operations are delayed • If {I} ≺ {R} Stronger configuration viability assumption is required
Solution ({R} ≺ {I}) ({I} ≺ {R}) {I} // {R} Tighter coupling between removal and installation
RDS Reconfiguration • Reconfiguration is based on Paxos (3 phases leader-based consensus alorithm) • l is the leader • c is the current configuration • configs is the set of active configurations • A ballot has a unique identifier b and a value v, which is a configuration • Paxos phases: • Prepare: l creates a new ballot and chooses/gets the value to propose. • Propose: l proposes <b,v> and gathers votes from a majority. • Propagate: l propagates decision
RDS Reconfiguration Recon(c,c’) l WQ RQ
RDS Reconfiguration Recon(c,c’) Prepare phase • Creates a new larger ballot b l WQ RQ
RDS Reconfiguration Recon(c,c’) Prepare phase l <1a, b> WQ RQ
RDS Reconfiguration • Updates its ballot’s value v with the one received • Updates its configs set Recon(c,c’) Prepare phase l <1b, b, configs, <b’’, c’’>> <1a, b> WQ RQ
RDS Reconfiguration Recon(c,c’) Propose phase l <1b, b, configs, <b’’, c’’>> <2a, b, c, v> <1a, b> WQ RQ
RDS Reconfiguration Recon(c,c’) Propose phase l <1b, b, configs, <b’’, c’’>> <2a, b, c, v> <1a, b> <2b, b, c, v, tag, val> WQ RQ <2b, b, c, v, tag, val> • Updates their tag and val • Adds v to their configs set
RDS Reconfiguration Recon(c,c’) Propagation phase l <1b, b, configs, <b’’, c’’>> <2a, b, c, v> <1a, b> <2b, b, c, v, tag, val> WQ RQ <3a, c, v, tag, val> <3a, c, v, tag, val> <2b, b, c, v, tag, val> <3a, c, v, tag, val> • Update their tag and val • Remove configuration c from their configs set
Proving Atomicity • Ordering configurations • Ordering operations Theorem 1: The set of installed configurations in the system is totally ordered. Theorem 2: If operation 1 precedes operation 2 then 1’s tag is not larger than 2’s tag.
Additional Assumptions • Eventual stabilization with • Unique leader l • Message delay bound d (unkown to the algorithm) • Gossip with frequency d • Restricted reconfiguration rate • Some quorums remain alive in active configurations ts: System stabilization time tl: Algorithm stabilization time 2d ts tl Let’s tr be the Request time
5d 2d 2d d Prepare Propose Propagate te max(tl, tr) te: end time Reconfiguration is complete Reconfiguration Latency Worst case scenario: Last reconfiguration was done by a different leader.
Reconfiguration Latency Other cases: The leader made the previous reconfiguration. 3d 2d d Propose Propagate max(tl, tr) te te: end time Reconfiguration is complete
Operation Latency • Phase latency: • 2d is sufficient for the phase round trip. • In some cases (pending reconfiguration), the phase might be delayed twice. 2d 2d 1st round trip 2nd round trip New configuration discovered • Operation latency: • Operations are bounded by 8d. • In some cases, the propagation phase of the read operation can be ignored, leading to a possible bound of 2d.
Experimental Results • IOA to Java code following set of rules. • Implementation of Attiya, Bar-Noy, and Dolev algorithm « ABD » (w/o Reconfiguration) and RDS which shares parts of the ABD code. • Using majority-based configurations. • Measuring operation latency • While varying configuration size • While varying algorithm instances
Experimental Results • Operation latency of RDS is competitive with ABD, confirming the theory. • Reconfiguration messages contain operation information which might accelerate operations in RDS.
Conclusion • RDS, Reconfigurable Distributed Storage. • With sound, flexible, non-intrusive and fast reconfiguration. • It solves two problems in one: Configuration replacement and Consensus. • Reconfiguration is inexpensive (time). • Fault tolerance is strenghtened. • RAMBO can become more agressive: it is exactly what we did here!