1 / 72

RAMBO: Reconfigurable Atomic Memory for Dynamic Networks

RAMBO: Reconfigurable Atomic Memory for Dynamic Networks. Seth Gilbert Nancy Lynch Alexander Shvartsman Presenter: Anastasia Braginsky (December 2013) Slides partially borrowed from Seth Gilbert (DSN ’03) and Edward Bortnikov (talk). RAMBO name.

zubeda
Download Presentation

RAMBO: Reconfigurable Atomic Memory for Dynamic Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAMBO: Reconfigurable Atomic Memory for Dynamic Networks Seth Gilbert Nancy Lynch Alexander Shvartsman Presenter: Anastasia Braginsky (December 2013) Slides partially borrowed from Seth Gilbert (DSN ’03) and Edward Bortnikov (talk)

  2. RAMBO name • Reconfigurable Atomic Memory for Basic Objects

  3. Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service • Conclusions

  4. Distributed Shared Memory Read Write(7) Write(0)

  5. Atomic Consistency (linearizability) • Definition: Each operation appears to occur at some point between its invocation and response • Sufficient condition: For each object x, all the read and write operations for xcan be partially ordered by , so that: • No operation has infinitely many other operations ordered before it •  is consistent with the order of invocations and responses: there are no operations such that 1 completes before 2 starts, yet 21 • All write operations are ordered with respect to each other and with respect to all the reads. • Every read returns the value of the last write preceding it in 

  6. Write(7) Read 7 op A completes before op B begins, then B returns the results of A Read Write(7) Write(0)

  7. Suggestions? • Central server? • Performance bottleneck • Single point of failure • So multiple servers need to replicate the content • And do not stop the world if some reconfiguration is needed • But now how to find the latest value of replicated object?

  8. Distributed Networked System • All-to-all connectivity, but messages can be lost, delayed, or re-ordered • No global clock or synchronization mechanism - asynchrony • Nodes can fail • A distributed networked system can be static (fixed set of participating nodes) or dynamic

  9. And if everything fails? • Memory access operations are guaranteed to terminate under certain assumptions • Static: • The majority of replicas need to be active • Network delays are bounded • Dynamic: • Dynamically changing subset of replicas need to be active during certain periods • Otherwise… Sorry…  • Operations may not terminate

  10. Quorums Read Write(7) Dependable Systems and Networks 2003

  11. Dynamic Atomic Memory Dependable Systems and Networks 2003

  12. Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service • Conclusions

  13. Static Quorum Systems • Upfaland Wigderson (85) • First general scheme for emulating shared-memory in the message-passing system • majority sets of readers and writers … • Attiya, Bar-Noy and Dolev (90/95) • Dijkstra Award in 2011 • Including extensions to the original algorithm [N. Lynch and A. Shvartsman. Robust emulation of shared memory using dynamic quorum-acknowledged broadcast. 1997]

  14. A(ttiya) B(ar-Noy) D(olev) • Algorithm uses replication to achieve fault-tolerance and availability • n nodes • The system tolerates at most n/2-1 crashes

  15. ABD for a single register • Each node i maintains the local value of the register • valueiandtagi = <seq, pid> • Tags are compared lexicographically • Each new write assigns a unique tag (pid to break ties) • Read and write operations have two phases • Query replicas for information • Propagate information to replicas • Send to everyone, majority should response

  16. Read: Phase I

  17. Read: Phase I

  18. Read: Phase II

  19. Read: Phase II

  20. Consistency • Two majorities have non-empty intersection • There is at least one node participating in Propagation phase of previous operation and in Query phase of this one • All writes ordered by their tags

  21. Too long waiting for the majority? • Use quorum systems • Quorum is a subset of nodes • Any two quorums intersect • The size of the set can be much less than the majority • The majority-based implementations tolerate crashes of any minority • The quorum-based implementations require that the nodes in at least one quorum do not crash

  22. Consensus • Set of processes need to agree an a value • Nodes propose several values for consideration • Any solution must satisfy: • Agreement: no two processes decide on different values • Validity: the value decided was proposed by some node • Termination: all correct processes reach a decision • Consensus termination can not be guaranteed in the presence of even a single process crash • Paxos is an implementation of a consensus

  23. Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service

  24. RAMBO multi-reader, multi-writer • Short term: Quorum-basedReplication – to provide fault tolerance • Read- and write- quorums collected into configurations • Any quorum-configuration can be installed in any time • Long term: Reconfiguration – to cope with changing participants • Participants can joinand fail

  25. Rambo • Decouple read/write ops and reconfiguration • fast read/write ops, even if recon slow • Astable state (no reconfigurations) is similar to the static two-phase ABD, but • Extended for multi-writer registers • Generalized to use quorum systems • New participants can join the service by contacting at least one existing participant

  26. Quorums Reconfigurations • Performed concurrently with any ongoing reads and writes • Multiple reconfigurations can be in progress concurrently • Reconfiguration involves • Introduction of a new configuration • Garbage collection of obsolete configuration(s)

  27. frequent reconfiguration? clocks out of synch? • messages lost? • messages delayed? Rambo stabilizes Network stabilizes Rambo stabilization Dependable Systems and Networks 2003

  28. Three Sub-Protocols • Joiner • Joiner is notified by a device that the device wants to join • The device provides the initial world view (set of devices that this device thinks has already joined) • Joiner contacts this world and retrieves the information necessary for the new device to participate • Reader-Writer: Executing read-write operations and old configurations garbage-collection • Recon: Producing new configurations

  29. Configuration map • Each participant maintain a configuration map – cmap – to store the sequence of configurations • For node i, cmapi(k) is • the configuration number k if configuration is active • or a notification that this configuration doesn’t yet exist • or a notification that this configuration was already garbage collected • This sequence evolves as new configurations are introduced by Recon and as all configurations are garbage collected

  30. . . . c0      . . . c0 c1     . . . c0 c1 c2    . . . ± c1 c2    . . . ± ± c2    . . . ± ± ± c3   . . . CMAP Evolution

  31. Reader-Writer • Each read or write executes in the context of one or more active configurations (must use all active configurations) • Reads and writes proceed concurrently with ongoing reconfigurations • Two phases • Query phase – information is retrieved from one (or more) read-quorums of all active configurations • Propagate phase – information is updated in one (or more) write-quorums of all active configurations • Garbage-Collection (GC) – removing old configurations • Notifying about old configuration(s) • Propagating information from old configuration to the next

  32. RAMBO Assumptions • Assumptions regarding RAMBO behavior: • Regularly sends gossip messages to the participants • The initial world views overlap sufficiently such that every node that has joined the system is aware about every other node soon enough • Every configuration remains viable until sufficiently long after the next new configuration is installed • Reconfigurations are not initiated too frequent

  33. Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service

  34. The system • Set of devices communicating via all-to-all asynchronous message-passing network • I : totally ordered set of device identifiers 4 1 2 7 6 A node or a participant 5 3

  35. The system Joiner Read-Write Recon • Set of devices communicating via all-to-all asynchronous message-passing network • I : totally ordered set of device identifiers • Nodes may fail by stopping (all components) without worning 1 4 Joiner Read-Write Recon Joiner Read-Write Recon 2 7 Joiner Read-Write Recon Joiner Read-Write Recon 6 Joiner Read-Write Recon Joiner Read-Write Recon 3 5

  36. Shared Memory Read/Write Objects • X : set of object identifiers • For each object xX, Vx is the set of values that x may take on • (v0)x– the initial value of object x • (i0)x – the initial creator of object x, the node that is initially responsible for object x (this responsibility can be delegated) • T = N x I : set of tags, used to order the values written to the system

  37. Configurations • C : set of configuration identifiers • Each identifier cC is assosiated with unique configuration consisting of: • members(c) – a finite subset of I • read-quorums(c) – a set of finite subsets of members(c) • write-quorums(c) – a set of finite subsets of members(c) • For every cC, for every Rread-quorums(c), and for every Wwrite-quorums(c): RW≠

  38. RAMBO API Domains • I = set of Nodes • V = set of Values • C = set of Configurations Inputs and Outputs are all asynchronous per node iI and object xX Input (Request) Join(J)// J – initial world view Read Write(v) Recon (c, c’)// reconfiguration request Fail Output (Response) Join-ack Read-ack(v) Write-ack Recon-ack// request has been proceeded Report (c) // new configuration

  39. Requests’ Well-Formedness • No requests after fail • Each client issues at most one join request and waits for acknowledgement before any further requests • Before issuing a new read/write/recon wait for previous acknowledgment • Each client issues at most one recon(*,c) request (configuration identifiers are unique) • Client can request reconfiguration from c to c’ only if c was installed and all members of c’ have already joined

  40. Responses’ Well-Formedness • No responses after fail • Responses comes only upon requests

  41. Reconfiguration service API Domains • I = set of Nodes • V = set of Values • C = set of Configurations Inputs and Outputs are all asynchronous per node iI and object xX Input (Request) Join Recon(c,c’) Request-config (k) // the client has learned of every configuration preceding k Fail Output (Response) Join-ack Recon-ack New-config(c,k) // the kth configuration has been agreed upon Report(c)

  42. Recon Service Specification • Recon • Chooses configurations • Tells members of the previous and new configuration. • Informs Reader-Writer components (new-config). • Behavior (assuming well-formedness): • Agreement: Two configs never assigned to same k. • Validity: Any announced new-config was previously requested by someone. • No duplication: No configuration is assigned to more than one k.

  43. Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service

  44. Suppress explicit mention of x • The shared memory is described as the composition of a separate implementation for each object xX • V, v0, c0, and i0as shorthand for • Vx, (v0)x, (c0)x, and (i0)x

  45. Joiner automata state • status {idle, joining, active, failed}, initially idle • others-status, a mapping from Recon and Reader-Writer to {idle, joining, active}, initially everywhere idle • initial-world (iw)  I, initially 

  46. Join(J) Joiner automata

  47. Hope at least one will answer… Join(J) join Joiner automata

  48. Join(J) join join join Joiner automata

  49. Joiner automata

More Related