530 likes | 670 Views
Détecteurs de défaillances, mémoire partagée/passages de messages. Hugues Fauconnier LIAFA, Université Denis Diderot. Plan. Introduction Objectifs et contexte Objets et mémoire partagée Mémoire partagée linearisabilté I mplémentation wait-free Universalité du consensus
E N D
Détecteurs de défaillances, mémoire partagée/passages de messages Hugues Fauconnier LIAFA, Université Denis Diderot
Plan • Introduction • Objectifs et contexte • Objets et mémoire partagée • Mémoire partagée • linearisabilté • Implémentation wait-free • Universalité du consensus • Communication par messages • Détecteurs de défaillances • Implémentation de la mémoire partagée • Implémentation d'objets partagés • Hiérarchie du consensus et détecteurs de défaillances • Conclusion(s)
Introduction et contexte • Possible – impossible (FLP) • Mémoire partagée - communication par échanges de messages • Objets partagés: • Comparaison et hiérarchie: • un test-and-set est-il plus puissant qu'un compare-and-swap? • Vers les transactions
Introduction… • Détecteur de défaillances: • Détecteur minimal et comparaison (connaissance nécessaire et suffisante sur les pannes) • hiérarchie des problèmes • Consensus • Accord sur une valeur • Registres • Exclusion mutuelle • Le plus faible des plus faibles • K-set consensus (accord sur au plus k-valeurs)
Shared memory • Set of processes p1, …, pn • (process=sequential thread) • Processes are asynchronous • a step can take an arbitrary (finite) time • Processes communicate trough shared data structures (objects) • examples: shared memory, test-and-set, queue..
Objects: • an object is defined by its type • e.g.: the type of R is atomic register • the type of the object defines a set of possible states and a set of primitives operations • e.g.: the state of the register is the value stored, the primitives are read()write(v) • processes access objects by primitives operations
Objects: • we consider here only atomic objects • a sequential specification defines the behavior of the object (a transition system) • linearizability (=atomicity) • operations of concurrent processes may overlap, but each operation appears to take effect instantaneously between its invocation and its response: • the operation appears to be atomic • crashes: • if a process crashes between an invocation and the corresponding response the operation completes or aborts • every invocation by correct processes terminates
Example: atomic register • States : the value stored ( initially) • Operations: read() and write(v) • Sequential specification: • read() returns the value stored • write(v) changes the state of the register (the new state is v) • Linearizability: each time interval between a request / answer of an operation can be reduced to a point such that the history of read/write satisfies the specification
Atomic register • With only one writer linearizability is here equivalent to: • a read returns the last value written • if a read is concurrent with a write the read returns either the previous written value or the value of a concurrent write() • if a read operation r precedes another read operation r' then r' cannot return a value written before the one returned by r • can be generalized to multi-writer atomic registers
0 Write 1 Write 0 Read 0 Linearizable 0 Write 1 Write 0 Read 1
Linearizable? 0 Write 1 Read 1 Read 0 impossible
Another example • consensus: sequential specification decide(1)/propose(*) decide(0)/propose(*) propose(0) propose(1)
Another example • RMW RMW(r register, f function) returns value previous := r r :=f(r) return previous • from RMW we get test-and-set, swap, compare-and-swap.
Implementation • Given some objects O1, …, Om and processes p1, …, pn is-it possible to implement another object O? • Wait-free implementation: • the implementation is correct (in an intuitive sense) • every invocation from correct processes terminates • moreover a correct process can always terminate its invocation with only its own steps (with objects O1,…,Om)
Wait-free • Wait-free implementation • As each process can always finish the work alone, a wait-free implementation tolerate any number of (crash failure) • very strong assumption!
Wait-free implementations • Consider k-consensus (i.e. consensus between k processes) • Let the consensus number for object X be the largest k such that k-consensus can be implemented with X and atomic registers • (clearly if consensus number for O is strictly greater than consensus number for O', there is no implementation for O using only O')
Wait-free implementations • Results • registers have consensus number equals to 1 (FLP) • test-and-set has consensus number equals to 2 • … • for each n there some objects with consensus number n
Example • FIFO queue: decide(v) returns val prefer[P]:=v if deq(q) = then return prefer[P] else return prefer[Q] With FIFO and registers it is possible to get 2-consensus but not 3-consensus
Results • Universality of consensus (Herlihy): the n-consensus is universal in a system of n processes: every object shared by n processes can be (wait-free) implemented with n-consensus and registers • (principle of the proof: with help of a n-consensus processes agree on the history of the object)
Plan • Objects • shared memory model • linearizability • wait-free implementation • Main results: universality of consensus • Message passing • failure detectors • shared memory implementation • object implementation • Consensus Hierarchy with failure detectors • Conclusion
Message passing • The previous results prove that generally (at least) objects with consensus number >1 cannot be implemented with only registers • Instead of sharing data structures it is interesting to consider message passing models • message passing: processes don't share data but can send and receive messages • (Note that message passing could be defined in the previous general framework– communication channels are then the shared data structures)
Message passing model • Processes communicate by messages • Communication is asynchronous (no bound on communication delays) • Communication is point-to-point and reliable • Processes can fail by crashing • Message passing models are suitable and natural for networks • (shared objects models are more suitable for hardware)
Message passing • In message passing it is interesting to implement objects: • objects are easier to work with • some objects are natural in message passing models (e.g. registers consensus)
Atomic register: practical point of view • Data server • Ensure safety properties • If a value is written it is available (even if the writer disappears) • When a process ends its write() then all next read() will return this value (or a value written later) –note that the writer knows when the write ends
Shared register implementation • With only one reader and one writer and a majority of correct processes (sketch): • for the k-th write • to write(v): the writer sends (v,k) to all processes and waits for receiving an "ack" from a majority of processes. • to read(): the reader asks all processes and waits for receiving an answer (v,k) from a majority of processes; the value read is the value with the greatest k • when a process receives (v,k) from the writer it stores (v,k) and then sends an "ack" to the server • when a process receives a query from the reader it answers with the stored (v,k).
It works… • because: • by the majority assumption there is always at least one process that participates to the last write and the read. • then the read returns the last written value • (but this implementation is not really atomic: if the writer crashes during a write, next reads could returns the previous value or the new one. • It is not very difficult to fix it: the reader always value with maximal timestamp ) • (some classical algorithms enables to implement general atomic registers from atomic register with one reader and one writer)
Implementation issues • in message passing there is no implementation of consensus (even if at most one process can crash) • the implementation of registers needs to have a majority of correct processes
Then … failure detectors • The impossibility results come from crashes (without failure all these problems are easy to solve). • Then: add oracles giving (possibly unreliable) information about crashes. • what information about crashes of processes enable to solve the problem? • what information about crashes is needed?
Failure detectors • distributed "oracle" F: • at each time t a process can ask the failure detector and gets an answer • (generally the answer is a list of processes suspected to be dead) • the output is not the same at each process • the output of failure detector F depends only on the history of crashes (not on the states of processes). • Example: perfect failure detector • output: lists of suspected processes • if p is in the list for q then p is crashed • if p is crashed then p will eventually belong to the list of suspected processes of q
Failure detector comparison • Reduction: • Failure detector F is weaker than failure detector F' (F≤F') if F can be implemented from F' • ≤ defines a partial order
Minimal Failure Detector • Given a problem P, F is a minimal failure detector for P if and only if • With help of F, P can be solved • if F' enables to solve P then F ≤ F' • Then if F is a minimal failure detector for P: • F encapsulates the information about crashes needed to solve P
Minimal Failure Detector • Why look for the minimal failure detector? • find the needed information about crashes • compare problems: if the minimal failure detector for P is weaker than the minimal failure detector for P' then P is easier than P' • (from a practical point of view the knowledge of the minimal failure detector helps to find the assumptions on the underlying system to solve the problem)
Then to implement Objects: • In message passing • for each object O find the minimal failure detector to implement O • from the comparison between these failure detectors we get an hierarchy on these objects • Then we get 2 hierarchies on objects • consensus number as defined before • minimal failure detector needed for the object
S-register • Begin with registers (consensus number =1) • S-register is an atomic register in which only processes in S can read or write (but all processes may participate to its implementation)
Weakest failure detector • with a majority of correct processes atomic registers can be implemented without failure detector • but without a majority of correct processes? • Failure detector Σ
Failure detector ΣS • ΣS(p,t) (output for process p of failure detector ΣS at time t) is a list of trusted processes. (q Є ΣS(p,t) means that p considers that q is not dead at time t) • Intersection: for each process p, q in S, for each time tout t , t’ : ΣS(p,t) ΣS(q,t’) is not empty (at least one process is trusted by p and q) • Completeness: There is a time t such that for each correct process in S for each time t’>t ΣS(p,t’) contains only correct processes
Remarks • with a majority of correct processes ΣS can be implemented in asynchronous systems. • ΣS gives a kind of quorum (a quorum is a family of sets such that two elements of the family always have a non empty intersection).
Theorem • ΣS is the weakest failure detector to implement S-register • sufficient part: adapt the previous algorithm • necessary part: more difficult…
S-Consensus • S is a set of processes • S-consensus • processes in S propose value and have to (irrevocably) decide. The decision has to ensure: • Validity: the decision value has been proposed • Agreement: if p and q decide they decide the same value • Termination: every correct process eventually decides
ΩS • ΩS(p,t) (output for p of failure detector ΩS at time t) is a process (the leader) • Eventual leader election: there is a time t, there is a correct process l, such that for every correct process p in S for all time t’>t ΩS (p,t’)=l • intuitively: after some time all processes agree on the same leader forever
Theorem • ΣS*ΩS is the weakest failure detector for S-consensus. (ΣS*ΩS outputs both ΣS and ΩS)
For the proof • (necessary condition) • Adaptation of the proof of Chandra, Hadzilacos et Toueg: from an S-consensus algorithm using a failure detector, implement ΩS • With reliable broadcast and S-consensus implement S-register, (then use the previous theorem)
For the proof Sufficient condition process in S forever C:=1 +r mod n Send(Coord, v,r) to C wait for receiving (One,*,r) from C or suspect C in ΩS if receeived (One,w,r) then FromCoord:=w else undef Send(Keep,FromCoord,r) to all wait for receiving (Two,*,r) form all processes in ΣS If there only one value v received decide this value v send (decide,v) to all stop else if received only 2 values (w and undef) then v:=w
all processes • When received (Coord,*,k) for the first time (let (Coord,x,k) this message ) • send (One,x,k) to all processes in S • When received (Keep,*,k) for the first time, (let (Keep,x,k) this message ) • send Two,x,k) to all processes in S
k-consensus k-consensus = consensus between any subset of k processes Result: • for 2<=k<=n: The weakest failure detector for k-consensus is Σ*Ω
proof (idea): • consider case k=2 • From the previous results: • the weakest failure detector for 2-consensus is the set of ΣS*ΩS for all subsets with 2 elements
Proof • From these ΣS (S is the set of subsets with two elements) atomic registers can be implemented then we get Σ • From these ΩS (S is the set of subsets with two elements)it is possible to implement Ω: • let G=(X,E) the graph where X is the set of processes, and (p,q)ЄE if there is x such that q is an eventual leader pour Ω {p,x}. Consider the strongly connected components of: there is an unique sink connected component and this sink contains (eventually) only correct processes.
q p p has q as leader the sink
Proof (sketch) From this we deduce an algorithm for Ω :all processes approximate this graph and compute the sink: the output of the emulated failure detector is this sink. Eventually, this sink contains only correct processes. (then extract the same leader in this sink) Then we get Ω
Corollary If the consensus number of atomic object T is 2: Then: • The weakest failure detector for T is Σ*Ω • Every failure detector implementing T implements any object. • (in other word T is universal for all n)