1 / 17

Dynamic Atomic Storage Without Consensus

Dynamic Atomic Storage Without Consensus. Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR ). The Goal. Reliable replicated storage Using unreliable components Asynchrony - tolerate unpredictable network delays.

ahanu
Download Presentation

Dynamic Atomic Storage Without Consensus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR)

  2. The Goal • Reliable replicated storage • Using unreliable components • Asynchrony - tolerate unpredictable network delays server (process) client

  3. Designing an Asynchronous Replicated System • State machine replication (e.g., Paxos) • Any object • Impossible in asynchronous systems • Atomic R/W Register [Attiya, Bar-Noy, Dolev 95] • Simple object: read( ), write(v) • Possiblein asynchronous system • Atomic (linearizable) • Liveness:if #failures < #servers/2 thenevery operation invoked on a correct server eventually completes.

  4. Breaking the Minority Barrier Our first contribution: First "black box" definition (in terms of user interface) • Over a long period of time #failures < #servers/2 is not good enough • Reconfiguration! • Increasing resilience by changing the set of servers • Example: 3 failures out of 5 • Semantics of Reconfigurable R/W register: • Atomic (linearizable) • Liveness: ? D E A B C

  5. Reconfigurable Register: User Interface • read() (returns a value) • write(value) (returns OK) • reconfig(c) (returns OK) • c is a set of changes (relative to current config.) • Each change is either (Add, pid) or (Remove, pid) • Example: c = {+C, +E, –D} • Only processes that were successfully added can invoke ops • Universe of processes (servers): • Unknown, unbounded, possibly infinite • At any given time, only a finite number has been added change change change

  6. Definitions • Current(t) – servers in the system at time t • the “current configuration” • AddPending(t) – servers whose Add is pending at t • RemovePending(t) – servers whose Remove is pending at t • Faulty(t) – servers that have crashed by t • pi is active in an execution if • During the execution, pi does not crash • Some process invokes reconfig adding pi • No process invokes reconfig removing pi

  7. Dynamic System Liveness • Static system: operations complete if #failures<#servers/2 • What should this be in a dynamic system? • Try #1: for every t, a minority of Current(t) is in Faulty(t) What if processes crash while others are removed? no operation is guaranteed to complete in new configuration! • Try #2: for every t, a minority of Current(t) is in Faulty(t)RemovePending(t) reconfig({–A}) C A B OK

  8. Adding Servers reconfig({+G}) reconfig({+F}) OK OK Q: At time t0, who can crash from {A, B, ..., G}? A: minority of {A, B, ..., E}, and in addition, • in this scenario G can crash • in a different scenario F can crash • Simple condition: any 2 servers can fail (fewer than |Current(t)|/2) B D E A C F G time t0

  9. Dynamic Service Liveness If #reconfigs invoked in the execution is finite and at every time t in the execution, fewer than |Current(t)|/2 processes out of Current(t)AddPending(t) are in Faulty(t)RemovePending(t) Then: • Eventually, every active process that was successfully added can invoke operations • Every operation invoked by an active process eventually completes

  10. Reconfigurable Solutions Many previous solutions: All use consensus (or similar) State machine replication (Paxos) Use state-machine to agree on set of servers Virtual Synchrony based solutions e.g.,[Yeger-Lotem, Keidar, Dolev 97] R/W register + reconfiguration service [Lynch, Shvartsman 97], [Englert, Shvartsman 00] Rambo [Lynch, Shvartsman 02] Rambo II [Gilbert, Lynch, Shvartsman 03] Long Lived Rambo [Georgiou, Musial, Shvartsman 04] Is consensus really necessary? Our second contribution: Consensus is NOT needed! DynaStore - algorithm for a completely asynchronous system membership service stronger than consensus (equivalent to P) one designated “reconfigurer” consensus to agree on next configuration 10

  11. “Old” and “New” Configurations • A reconfiguration transfers the state from a majority of the old config. to a majority of the new config. • What if there are concurrent reconfigurations ? • Suppose that initial configuration is {A, B, C, D} • Ainvokes reconfig({+E}); C invokes reconfig({D}) • Awrites to {A, D, E}, a majority of {A, B, C, D, E} • C reads from {B, C}, a majority of {A, B, C} • No intersection  Atomicity is violated! • Simple solution: consensus on the sequence of configurations • But how can we do this without consensus?

  12. The approach in DynaStore • For each configuration c, we use a (weak) snapshot nextConfig(c) to store the next configuration • (weak) snapshot objects are (easily) implemented in an asynchronous environment • Processes update nextConfig(c) tosuggest the next configuration after c (concurrent updates possible) • Sequence of Established Configurations (simplified): • The initial configuration is established • If c is established, then the first snapshot update to nextConfig(c) is the next established configuration after c included in every scan from nextConfig(c)

  13. Transferring the State • scan of nextConfig(c) returns a set of configs that follow c • if c is established, one config in the returned set is the nextestablished config after c • scanning nextConfig for each returned config returns a further set, etc.this creates a DAG of configurations • This DAG contains the sequence of established configs • A reconfiguration transfers state along all paths in the DAG • This guarantees that state is transferred along the sequence of established configurations

  14. Example {A, B, C, D, E} • Suppose that initial configuration is {A, B, C, D} • Ainvokes reconfig({+E}); C invokes reconfig({D}) • A updates nextConfig(C0) to C1 • A scans nextConfig(C0) to check for concurrent updates. Scan returns {C1}, i.e., no concurrent updates detected • C1 is the next established config after C0 • A’s state transfer: • Read from maj. of C0 and maj. of C1 • Write latest value found to maj. of C1 C1 C0 {A, B, C, D}

  15. Example {A, B, C, D, E} • Suppose that initial configuration is {A, B, C, D} • Ainvokes reconfig({+E}); C invokes reconfig({D}) • Concurrently, C updates nextConfig(C0)to C2 and scans it. Scan returns {C1, C2}, implying that A’s update was concurrent • C updates nextConfig(C1) and nextConfig(C2) to C3. No concurrent updates detected • C3 is an established configuration • C’s state transfer: • Read from maj. of each config on every path found from C0 to C3 • Write latest value found to maj. of C3 C1 C0 {A, B, C, D} {A, B, C, E} C3 C2 {A, B, C}

  16. Example {A, B, C, D, E} • Suppose that initial configuration is {A, B, C, D} • Ainvokes reconfig({+E}); C invokes reconfig({D}) • A invokes a write(newValue) operation in C1 • In this scenario, DynaStore guarantees: • Either C’s state transfer finds newValue in C1, or A’s write op discovers C3 and ends after writing newValue to maj. of C3 • Read operations also traverse the DAG, and will find newValue on the path of established configurations, intersecting the write C1 C0 {A, B, C, E} {A, B, C, D} C3 C2 {A, B, C}

  17. Conclusions • First “black box” definition of dynamic R/W register • In terms of events visible to user • A natural failure model – resilience changes dynamically • Possibly useful for specifying other dynamic problems • DynaStore: first asynch. dynamic storage protocol • Implements a Reconfigurable Atomic MWMR register • In a completely asynchronous system (consensus impossible) • Proves that R/W storage is really easier than consensus (not only in a static system)

More Related