720 likes | 847 Views
RAMBO: Reconfigurable Atomic Memory for Dynamic Networks. Seth Gilbert Nancy Lynch Alexander Shvartsman Presenter: Anastasia Braginsky (December 2013) Slides partially borrowed from Seth Gilbert (DSN ’03) and Edward Bortnikov (talk). RAMBO name.
E N D
RAMBO: Reconfigurable Atomic Memory for Dynamic Networks Seth Gilbert Nancy Lynch Alexander Shvartsman Presenter: Anastasia Braginsky (December 2013) Slides partially borrowed from Seth Gilbert (DSN ’03) and Edward Bortnikov (talk)
RAMBO name • Reconfigurable Atomic Memory for Basic Objects
Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service • Conclusions
Distributed Shared Memory Read Write(7) Write(0)
Atomic Consistency (linearizability) • Definition: Each operation appears to occur at some point between its invocation and response • Sufficient condition: For each object x, all the read and write operations for xcan be partially ordered by , so that: • No operation has infinitely many other operations ordered before it • is consistent with the order of invocations and responses: there are no operations such that 1 completes before 2 starts, yet 21 • All write operations are ordered with respect to each other and with respect to all the reads. • Every read returns the value of the last write preceding it in
Write(7) Read 7 op A completes before op B begins, then B returns the results of A Read Write(7) Write(0)
Suggestions? • Central server? • Performance bottleneck • Single point of failure • So multiple servers need to replicate the content • And do not stop the world if some reconfiguration is needed • But now how to find the latest value of replicated object?
Distributed Networked System • All-to-all connectivity, but messages can be lost, delayed, or re-ordered • No global clock or synchronization mechanism - asynchrony • Nodes can fail • A distributed networked system can be static (fixed set of participating nodes) or dynamic
And if everything fails? • Memory access operations are guaranteed to terminate under certain assumptions • Static: • The majority of replicas need to be active • Network delays are bounded • Dynamic: • Dynamically changing subset of replicas need to be active during certain periods • Otherwise… Sorry… • Operations may not terminate
Quorums Read Write(7) Dependable Systems and Networks 2003
Dynamic Atomic Memory Dependable Systems and Networks 2003
Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service • Conclusions
Static Quorum Systems • Upfaland Wigderson (85) • First general scheme for emulating shared-memory in the message-passing system • majority sets of readers and writers … • Attiya, Bar-Noy and Dolev (90/95) • Dijkstra Award in 2011 • Including extensions to the original algorithm [N. Lynch and A. Shvartsman. Robust emulation of shared memory using dynamic quorum-acknowledged broadcast. 1997]
A(ttiya) B(ar-Noy) D(olev) • Algorithm uses replication to achieve fault-tolerance and availability • n nodes • The system tolerates at most n/2-1 crashes
ABD for a single register • Each node i maintains the local value of the register • valueiandtagi = <seq, pid> • Tags are compared lexicographically • Each new write assigns a unique tag (pid to break ties) • Read and write operations have two phases • Query replicas for information • Propagate information to replicas • Send to everyone, majority should response
Consistency • Two majorities have non-empty intersection • There is at least one node participating in Propagation phase of previous operation and in Query phase of this one • All writes ordered by their tags
Too long waiting for the majority? • Use quorum systems • Quorum is a subset of nodes • Any two quorums intersect • The size of the set can be much less than the majority • The majority-based implementations tolerate crashes of any minority • The quorum-based implementations require that the nodes in at least one quorum do not crash
Consensus • Set of processes need to agree an a value • Nodes propose several values for consideration • Any solution must satisfy: • Agreement: no two processes decide on different values • Validity: the value decided was proposed by some node • Termination: all correct processes reach a decision • Consensus termination can not be guaranteed in the presence of even a single process crash • Paxos is an implementation of a consensus
Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service
RAMBO multi-reader, multi-writer • Short term: Quorum-basedReplication – to provide fault tolerance • Read- and write- quorums collected into configurations • Any quorum-configuration can be installed in any time • Long term: Reconfiguration – to cope with changing participants • Participants can joinand fail
Rambo • Decouple read/write ops and reconfiguration • fast read/write ops, even if recon slow • Astable state (no reconfigurations) is similar to the static two-phase ABD, but • Extended for multi-writer registers • Generalized to use quorum systems • New participants can join the service by contacting at least one existing participant
Quorums Reconfigurations • Performed concurrently with any ongoing reads and writes • Multiple reconfigurations can be in progress concurrently • Reconfiguration involves • Introduction of a new configuration • Garbage collection of obsolete configuration(s)
frequent reconfiguration? clocks out of synch? • messages lost? • messages delayed? Rambo stabilizes Network stabilizes Rambo stabilization Dependable Systems and Networks 2003
Three Sub-Protocols • Joiner • Joiner is notified by a device that the device wants to join • The device provides the initial world view (set of devices that this device thinks has already joined) • Joiner contacts this world and retrieves the information necessary for the new device to participate • Reader-Writer: Executing read-write operations and old configurations garbage-collection • Recon: Producing new configurations
Configuration map • Each participant maintain a configuration map – cmap – to store the sequence of configurations • For node i, cmapi(k) is • the configuration number k if configuration is active • or a notification that this configuration doesn’t yet exist • or a notification that this configuration was already garbage collected • This sequence evolves as new configurations are introduced by Recon and as all configurations are garbage collected
. . . c0 . . . c0 c1 . . . c0 c1 c2 . . . ± c1 c2 . . . ± ± c2 . . . ± ± ± c3 . . . CMAP Evolution
Reader-Writer • Each read or write executes in the context of one or more active configurations (must use all active configurations) • Reads and writes proceed concurrently with ongoing reconfigurations • Two phases • Query phase – information is retrieved from one (or more) read-quorums of all active configurations • Propagate phase – information is updated in one (or more) write-quorums of all active configurations • Garbage-Collection (GC) – removing old configurations • Notifying about old configuration(s) • Propagating information from old configuration to the next
RAMBO Assumptions • Assumptions regarding RAMBO behavior: • Regularly sends gossip messages to the participants • The initial world views overlap sufficiently such that every node that has joined the system is aware about every other node soon enough • Every configuration remains viable until sufficiently long after the next new configuration is installed • Reconfigurations are not initiated too frequent
Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service
The system • Set of devices communicating via all-to-all asynchronous message-passing network • I : totally ordered set of device identifiers 4 1 2 7 6 A node or a participant 5 3
The system Joiner Read-Write Recon • Set of devices communicating via all-to-all asynchronous message-passing network • I : totally ordered set of device identifiers • Nodes may fail by stopping (all components) without worning 1 4 Joiner Read-Write Recon Joiner Read-Write Recon 2 7 Joiner Read-Write Recon Joiner Read-Write Recon 6 Joiner Read-Write Recon Joiner Read-Write Recon 3 5
Shared Memory Read/Write Objects • X : set of object identifiers • For each object xX, Vx is the set of values that x may take on • (v0)x– the initial value of object x • (i0)x – the initial creator of object x, the node that is initially responsible for object x (this responsibility can be delegated) • T = N x I : set of tags, used to order the values written to the system
Configurations • C : set of configuration identifiers • Each identifier cC is assosiated with unique configuration consisting of: • members(c) – a finite subset of I • read-quorums(c) – a set of finite subsets of members(c) • write-quorums(c) – a set of finite subsets of members(c) • For every cC, for every Rread-quorums(c), and for every Wwrite-quorums(c): RW≠
RAMBO API Domains • I = set of Nodes • V = set of Values • C = set of Configurations Inputs and Outputs are all asynchronous per node iI and object xX Input (Request) Join(J)// J – initial world view Read Write(v) Recon (c, c’)// reconfiguration request Fail Output (Response) Join-ack Read-ack(v) Write-ack Recon-ack// request has been proceeded Report (c) // new configuration
Requests’ Well-Formedness • No requests after fail • Each client issues at most one join request and waits for acknowledgement before any further requests • Before issuing a new read/write/recon wait for previous acknowledgment • Each client issues at most one recon(*,c) request (configuration identifiers are unique) • Client can request reconfiguration from c to c’ only if c was installed and all members of c’ have already joined
Responses’ Well-Formedness • No responses after fail • Responses comes only upon requests
Reconfiguration service API Domains • I = set of Nodes • V = set of Values • C = set of Configurations Inputs and Outputs are all asynchronous per node iI and object xX Input (Request) Join Recon(c,c’) Request-config (k) // the client has learned of every configuration preceding k Fail Output (Response) Join-ack Recon-ack New-config(c,k) // the kth configuration has been agreed upon Report(c)
Recon Service Specification • Recon • Chooses configurations • Tells members of the previous and new configuration. • Informs Reader-Writer components (new-config). • Behavior (assuming well-formedness): • Agreement: Two configs never assigned to same k. • Validity: Any announced new-config was previously requested by someone. • No duplication: No configuration is assigned to more than one k.
Outline • Introduction • Background • Static Quorum Systems • Consensus • RAMBO high level overview • Preliminaries • The RAMBO algorithm • The reconfiguration service
Suppress explicit mention of x • The shared memory is described as the composition of a separate implementation for each object xX • V, v0, c0, and i0as shorthand for • Vx, (v0)x, (c0)x, and (i0)x
Joiner automata state • status {idle, joining, active, failed}, initially idle • others-status, a mapping from Recon and Reader-Writer to {idle, joining, active}, initially everywhere idle • initial-world (iw) I, initially
Join(J) Joiner automata
Hope at least one will answer… Join(J) join Joiner automata
Join(J) join join join Joiner automata