1 / 27

Dynamic atomic storage without consensus

Dynamic atomic storage without consensus. Aguilera, Keidar , Malkhi , Shraer , J . ACM 58, 2, 2011 Sarai Duek. The Problem. Implement an read/write register in a dynamic system. Read Write Reconfig. atomic. The Problem. What is atomicity?. The Problem.

gustav
Download Presentation

Dynamic atomic storage without consensus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic atomic storage without consensus Aguilera, Keidar, Malkhi, Shraer, J. ACM 58, 2, 2011 Sarai Duek

  2. The Problem • Implement an read/write register in a dynamic system. • Read • Write • Reconfig atomic

  3. The Problem What is atomicity?

  4. The Problem Atomicity is when each operation appears to occur at some point between its invocation and response. W W R R

  5. The Problem Atomicity is when each operation appears to occur at some point between its invocation and response. What is liveness?

  6. The Problem Atomicity is when each operation appears to occur at some point between its invocation and response. Liveness is a guarantee that the system will make progress under some conditions (e.g. majority).

  7. The Problem t-resilient R/W storage guarantees progress if fewer than t processes crash. For an n-process system, it is well known that t-resilient R/W storage exists when t < n/2, and does not exist when t ≥ n/2. W P0 P2 P2 P1 P3 P3 R

  8. The Problem In a dynamic system the majority can change. And liveness is achieved by reconfig operation. P2 P0 reconfig1(+,4) P3 P1 P4

  9. The problem The model • Unknown and unbounded universe of processes ∏. • Asynchronous reliable communication channels between each pair of processes. • Processes can be added, removed, crash or halt. p… p9 p8 p1 p… p2 p7 p4 p3 p6 p5

  10. The Problem Liveness conditions • The set of crashed processes and those whose removal is pending is a minority of the current or any pending future views. • No new reconfigoperations will be invoked for “sufficiently long” for the started operations to complete. A view is a set of changes. Changes lead to a new configuration of processes. p3 p0 p2 p4 p1 p5

  11. The problem • MWMR – Any process can write and read. • Written values are unique – (val, pid, ts). • Every process in the system knows the initial view. • We say, by convention, that a reconfig(Init) completes by time 0. • Members of view w store information about the current view. Changes – {Remove, Add} View – Set of changes For view w: w.remove – removal set w.join – join set w.members – set w.join\w.remove V(t) – union of all sets c such that a reconfig(c) completes by time t Init = V(0) P(t) – set of pending changes at time t F(t) – set of processes that crashed by time t

  12. The problem Dynamic Service Liveness If at every time t in the execution, fewer than |V(t).members|/2 processes out of V(t).members ∪ P(t).join are in F(t) ∪ P(t).remove, and the number of different changes proposed in the execution is finite, then the following hold: • Eventually, the enable operations event occurs at every active process that was added by a complete reconfig operation. • Every operation invoked at an active process eventually completes. Changes – {Remove, Add} View – Set of changes For view w: w.remove – removal set w.join – join set w.members – set w.join\w.remove V(t) – union of all sets c such that a reconfig(c) completes by time t Init = V(0) P(t) – set of pending changes at time t F(t) – set of processes that crashed by time t

  13. The problem Dynamic Service Liveness at every time t in the execution, fewer than |V(t).members|/2 processes out of V(t).members ∪ P(t).join are in F(t) ∪ P(t).remove. V(t) P(t).remove p2 p0 p6 F(t) p3 p8 p7 p1 p4 p9 P(t).join p10 p5

  14. The algorithm outline Write – phase • generate next sequence number • send a message with the value and the sequence number to all processes • each recipient updates its replica and sends ack • writer waits for majority of acks • Read configurations information • If a new view was discovered then restart read – phase in the new view (followed by a write – phase again). Write – phase • generate next sequence number • send a message with the value and the sequence number to all processes • each recipient updates its replica and sends ack • writer waits for majority of acks Read – phase send a request to all processes each recipient sends back current value of its replica wait for the majority to reply return value associated with largest sequence number Read – phase Read configurations information If a new view was discovered then restart read – phase in the new view. send a request to all processes each recipient sends back current value of its replica wait for the majority to reply return value associated with largest sequence number

  15. The algorithm outline Reconfiguration • write information about the new view to the quorum of the old one • execute the read and write phases, starting in the old view.

  16. Weak object Arrive and query obey the following semantics: • Integrity • Validity • Monotonicity of queries • Non-empty common intersection • Termination Allows a fixed set of processes P to use two operations Arrivei(c) Queryi()

  17. Weak object • The weak object algorithm • Operationarrivei(c) if collect() = Ø then pi.val.wirte(c) • return OK • Operation queryi() • C1 collect() • if C1 = Ø then return Ø • C2 collect() • return C2 • Procedure collect() • C Ø • foreach pi P • c pi.val,read() • if c then C CU {c} • return C Each process pi in P has a value field pi.val SWMR – only pi can use pi.val.write(c) but all processes can use pi.val.read()

  18. Weak object • The weak object algorithm • Operationarrivei(c) if collect() = Ø then pi.val.wirte(c) • return OK • Operation queryi() • C1 collect() • if C1 = Ø then return Ø • C2 collect() • return C2 • Procedure collect() • C Ø • foreach pi P • c pi.val.read() • if c then C CU {c} • return C arrive(v1) P3 P0 P0 v1 P4 P2 arrive(v2) P1 P5 P5 v2 C = { }

  19. Weak object • The weak object algorithm • Operationarrivei(c) if collect() = Ø then pi.val.wirte(c) • return OK • Operation queryi() • C1 collect() • if C1 = Ø then return Ø • C2 collect() • return C2 • Procedure collect() • C Ø • foreach pi P • c pi.val.read() • if c then C CU {c} • return C P3 P0 v1 P4 P2 query() P1 P5 v2 C = { } C = {v1, v2} C = {v1}

  20. Weak object • The weak object algorithm • Operationarrivei(c) if collect() = Ø then pi.val.wirte(c) • return OK • Operation queryi() • C1 collect() • if C1 = Ø then return Ø • C2 collect() • return C2 • Procedure collect() • C Ø • foreach pi P • c pi.val.read() • if c then C CU {c} • return C collect {a} collect {a, b} collect {a} collect {b} queryb{ } querya{ } queryaqueryb

  21. The algorithm • operationreadi (): • pickNewTSi ← FALSE • newView ← Traverse(∅,⊥) • NotifyQ(newView) • returnvimax • operationwritei (v): • pickNewTSi ← TRUE • newView ← Traverse(∅, v) • NotifyQ(newView) • return OK • operationreconfigi (cng): • pickNewTSi ← FALSE • newView ← Traverse(cng, ⊥) • NotifyQ(newView) • returnOK procedure NotifyQ(w) if did not receive {NOTIFY, w } then send {NOTIFY, w } to w.members wait for {NOTIFY, w} from majority of w.members

  22. The algorithm procedureTraverse(cng, v) desiredView← curViewi ∪ cng Front ← {curViewi} do s ← min{|| : ∈ Front} w ← any ∈ Front s.t. || = s if(iw.members) thenhalti ifw desiredViewthen arrivei(w, desiredView \ w) ChangeSets← ReadInView(w) ifChangeSets ∅ then Front ← Front \ {w} foreachc ∈ ChangeSets desiredView← desiredView ∪ c Front ← Front ∪ {w ∪ c} elseChangeSets ← WriteInView(w, v) whileChangeSets ∅ curViewi← desiredView returndesiredView Traverse is used to look for the next view considering all the changes suggested so far.

  23. The algorithm procedureTraverse(cng, v) desiredView← curViewi ∪ cng Front ← {curViewi} do s ← min{|| : ∈ Front} w ← one ∈ Front s.t. || = s if(iw.members) thenhalti ifw desiredViewthen arrivei (w, desiredView \ w) ChangeSets← ReadInView(w) ifChangeSets ∅ then Front ← Front \ {w} foreachc ∈ ChangeSets desiredView← desiredView ∪ c Front ← Front ∪ {w ∪ c} elseChangeSets ← WriteInView(w, v) whileChangeSets ∅ curViewi← desiredView returndesiredView Init view

  24. The algorithm procedureTraverse(cng, v) desiredView← curViewi ∪ cng Front ← {curViewi} do s ← min{|| : ∈ Front} w ← any ∈ Front s.t. || = s if(iw.members) thenhalti ifw desiredViewthen arrivei (w, desiredView \ w) ChangeSets← ReadInView(w) ifChangeSets ∅ then Front ← Front \ {w} foreachc ∈ ChangeSets desiredView← desiredView ∪ c Front ← Front ∪ {w ∪ c} elseChangeSets ← WriteInView(w, v) whileChangeSets ∅ curViewi← desiredView returndesiredView InitView U {(+,3), (+,5), (-,1), (+,4), (+,7)} = V1 {(+,5), (-,1), (+,4)} {(+,3)} Init view {(+,5)} V2 V4 {(+,7)} {(+,3), (-,1), (+,4)} V6 {(-,1), (+,4)} {(+,7)} V3 V5 {(+,3), (+,5)} Front after iteration6 Front after iteration4 Initial Front Front after iteration 1 Edge returned from ReadInView Edge updated by Pi

  25. The algorithm • procedure ReadInView(w) • ChangeSets ← queryi (w) • ContactQ(R, w.members) • return ChangeSets • procedureWriteInView(w, v) • ifpickNewTSithen • (pickNewTSi, vimax , tsimax) ←(FALSE, v, (tsimax .num+ 1, i)) • ContactQ(W, w.members) • ChangeSets ← queryi (w) • return ChangeSets Procedure ContactQ sends a write-request including vimaxand tsimax when writing a quorum, and a when reading a quorum.

  26. Established views The unique sequence of established views E is constructed as follows: the first view in E is the initial view Init if w is in E, then the next view after w in E is w’ = w ∪ c, where c is an element chosen arbitrarily from the intersection of all sets C∅ returned by some query(w) operation in the execution.

  27. Thank you

More Related