Snapshot Algorithm

Snapshot Algorithm A paper by k. Mani Chady Leslie Lamport Presenting Einat Zuker

What is a Snapshot - intuition • Given a system of processors and communication channels between them, we want each processor to have a “picture” of the global system state. • Each processor however can only take a “small picture” of the global system (only itself…) • But, if we put together all the “small pictures”, we would have a complete description of the global state of the system. • The “big picture” we are putting together must be meaningful and informative to be called a snapshot of the system.

Snapshot - why do we want it • Stability detection • A stable system - the system in a given state holds a certain propriety means that all the possible next states of the system will hold that property too, then we can call the system stable. • Examples of stability: • Deadlock • No tokens in a token ring • Computation has terminated

The distributed system model • Representation – a directed graph. • Vertices - represent the processors • Edges - represent the communication channels • Assumptions: • no synchronization (no clocks) • Channels have infinite buffers • Channels are error-free • Channels deliver messages in the order sent (FIFO) • A message in a channel can be delayed for an arbitrary but finite time (all messages will eventually arrive at their destination)

The distributed system model - Definitions • Stateof a channel - the sequence of messages sent along the channel, excluding the messages received along the channel. • State of a processor – a single element of some finite set. p p p no messages sent. state of c is: empty q q q processor p sent M1 state of c is: M1 c c c processor p sent M2 state of c is: M2 M1

The distributed system model – Definitions cont’d • Event– an event e is the tuple: <p, s, s’, M, c> where: • p – the processor in which the event occurs • s – the state of p before the event • s’ – the state of p after the event • c – the channel whose state was changed by the event (can be null) • M – the message sent (or received) from p throw the channel c (can be null) • Less formally: an event is an atomic action of a processor, that may change the state of the processors, and the state of at most one channel connected to p.

Example – the single token conservation system The system properties: two processors, two communication channels, one token processors states: s0 – no token s1 – has token initial state for p: s1, initial state for q: s0, initial state for channels: empty c c’ events in the system can be: e1 = <p, s1, s0, pass token, c> e2 = <q, s0, s1, receive token, c> etc’… p p p p q q q q c c c S0 S1 S0 S1 S0 S0 c’ c’ c’ e1 e2

The distributed system model – Definitions cont’d • Global state – the set of the processors states and the channels states. • initial global state – a global state where each processor is in it’s initial state and each channel is in an empty state. • Next(S,e) – a function which value is the global state immediately after the occurrence of the event e in the global state S. • next() is defined only if event e can occur in the global state S. • for a global state S, and an event e = <p,s,s’,M,c> if next(S,e) = S’ then the state of p in S’ is s’ the state of the channel c in S’ is it’s state in S with the message M added to it’s tail or removed from it’s head

Example – the single token conservation system the possible global states of the single token conservation system e0= <p, s1, s0, pass token, c> S0 S1 e0= <p, s1, s0, pass token, c> next(S0,e0) = S1 e1 =<q, s0, s1, receive token, c> next(S1,e1) = S2 p p p p q q q q S2 S3 c c c c s0 s1 s0 s0 s1 s0 s0 s0 e2=<q, s1, s0, pass token, c’> next(S2,e2) = S3 e3=<p, s0, s1, receive token, c’> next(S3,e3) = S0 c’ c’ c’ c’

The distributed system model – Definitions cont’d • Computation of the system – a sequence of events in the system. • more formally: given a sequence of events seq = (e0,e1,…,ei,…en) seq is a computation of the system iff event ei can occur in state Si and next(Si, ei) = Si+1 (S0 is the initial global state) • in the previews example: the computation of the system was: (e0,e1,e2,e3) • but the sequence (e0, e2) can not be.

The Algorithm requirements • The snapshot algorithm must run concurrently with the system computation. • The snapshot algorithm can not alter the computation in any way. • Any messages sent for recording purpose must not interfere with the computation of the system.

Snapshot Algorithm - first idea • Each processor will add its state to the recorded snapshot at some point of the computation (let’s assume we can see the channels states also and record them in the same fashion) • What can happen?

first idea - the problem Lets take a look at the single token conservation system: e0= <p, s1, s0, pass token, c> S0 S1 the system moves to global state S1 - “token in c” c, c’, q decide to record themselves the system is in global state S0 - “token in p”. p decides to record itself p p p q q q e0= <p, s1, s0, pass token, c> S* c s1 s0 c c c’ s0 s1 s0 s0 the snapshot received there is no such global state reachable from S0! c’ c’

First idea - the problem cont’d • What happened? • p was recorded before it sent a message. • c was recorded after p sent a message. • the snapshot had too many messages in it. • Let us denote: • n - # of messages in channel right before it’s source was recorded • n’ - # of messages in channel right before recording the channel • In our case: n=0, n’=1 • Can we conclude that if n < n’ the snapshot is inconsistent?

first idea - the problem cont’d Lets take a look again at the single token conservation system: e0= <p, s1, s0, pass token, c> S0 S1 the system moves to global state S1 - “token in c” p, c’, q decide to record themselves the system is in global state S0 - “token in p”. c decides to record itself p p p q q q e0= <p, s1, s0, pass token, c> S* c s0 s0 c c c’ s0 s1 s0 s0 the snapshot received there is no such global state reachable from S0! c’ c’

first idea - the problem cont’d • What happened? • c was recorded before p sent a message. • p was recorded after it sent a message. • we lost messages in the snapshot. • Remember the denotation: • n - # of messages in channel right before it’s source was recorded • n’ - # of messages in channel right before recording the channel • In our case: n=1, n’=0 • Can we conclude that if n > n’ the snapshot is inconsistent?

First idea - conclusions • the problem in both cases was that we didn’t had a means to monitor the messages that went throw the channel when the recording was done. • we need the algorithm to insure that the snapshot we take will reflect the messages passing in the channel

The snapshot algorithm conditions • denotations: for two processor p, q and a channel c between them from p to q • n - # of messages sent throw c before p was recorded • n’ - # of messages sent throw c before c was recorded • m – # of messages received from c before q was recorded • m’ – # of messages received from c before c was recorded • the following conditions are required from the snapshot: • n = n’ m = m’ • n’ ≥ m’ n ≥ m • if n’ = m’, the recorded state of c must be the empty sequence • if n’ > m’, the recorded state of c must contain the messages: [tail] (n’),…,(m’+1)[head] messages sent by p along c m’ M6 M5 M4 M3 M2 M1 n’ the n’-th message the (m’+1)-th message

The snapshot algorithm conditions cont’d In less formal way: The recorded state of c must be the sequence of messages sent along c before the state of p is recorded, excluding the sequence of messages received along c before the state of q is recorded p recorded q recorded M1 M6 M4 M5 M3 M2 the recording of c

The algorithm outline • p will send a special message called a marker after the n message it sent (and before sending other message) • q will record channel c’s state. the recorded sate will be the messages received by q after q recorded it’s state and before q received the marker. • q will record it’s state spontaneously, or immediately after the marker is received that is, before receiving (or sending) any other messages

The algorithm creators k. Mani Chandy Leslie Lamport E. W. Dijkstra

the algorithm • Marker-Sending Rule for a Processor p: • For each channel c directed away from p, p sends one marker along c right after p records its state and before p sends further messages along c. • Marker-Receiving Rule for a processor q: • On receiving a marker along a channel C if q has not recorded its state then • q records its state • q records the state of c as the empty sequence else • q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c.

The algorithm - Running example c c c s1 s0 s0 s0 s0 s1 c’ c’ c’ q receives the token, and then receives the marker. q records itself and the incoming channel c p sends the token, then record itself p sends a marker c c s0 s1 s0 s1 c’ c’ p receives the marker. it already recorded itself, so it only needs to record the state of it’s incoming channel c’ p p p p p q q q q q q sends a marker S1 – has token empty empty S0 – no token

Some notes about the algorithm • The algorithm can be initiated by one or more processors. • each processor records its state spontaneously (without receiving markers from other processors) • the collection of the snapshot “pieces” from each processor is a topic for a separate discussion • but, if we will recall the synchronization algorithm for asynchronies system (with some variations), we can come up with ways to form the “big picture” for each processor.

termination of the algorithm do we have a snapshot of the system in a finite time? that is, do we have a recording of each processor and channel in a finite time? Lemma 1: if there is a path in the system from p to q, and p recorded itself, then q will record itself in finite time. proof: • if p is directly connected to q then p will send a marker to q and q will record itself once the marker has reached (remember that all messages sent throw a channel will reach their destination in finite time). • so, if p records its state and there is a path from p to q, then q will record its state in finite time because, by induction, every processor along the path will record its state in finite time and will send a marker in all of it’s outgoing channels.

termination of the algorithm cont’d Lemma 2: the algorithm terminates in finite time, with a recording of each processor and channel proof: • all the processors will eventually record their state (spontaneously, or because some other processor recorded itself as we know from Lemma 1) • this means every processor will send a marker throw all of it’s outgoing channels so, a marker will be sent throw all channels. • once the marker reaches it’s destination the channel will be recorded. this is true for all channels since all of them had a marker sent throw them. • thus, all the channels are recorded in finite time too.

Example – non deterministic system c c c c M M M M A A B B C D • the system properties: • two processors: p, q. two communication channels: c, c’ • p has 2 states {A,B} • q has 2 states {C,D} • p can send the message M while in state A. sending the message cusses it to move to state B. • p can receive the message N while in state B. receiving the message cusses it to move back to state A. • q works symmetrically to p. c’ c’ c’ c’ N N N N a possible computation of the system: p p p p q q q q S0 S1 S2 S3 C D initial global state e0 = <p,A,B,M,c> e1 = <q,C,D,N,c’> e2 = <p,B,A,N,c’> note that the calculation in this case is not deterministic. for example, from S0 the event occurred could have been also: e0=<q,C,D,N,c’>

The algorithm - Running example 2 c c c M M M N N N A B B C D C c’ c’ c’ the system is in global state S0 p records itself and sends the marker system goes to global state S1 system goes to global state S2 c c c M M M D A A A D D p p p p p p q q q q q q N N N c’ c’ c’ p receives the marker. it already recorded itself so it needs to record the state of c’ system goes to global state S3 q receive the marker. q records itself and the incoming channel c. q sends the marker what is strange in this snapshot? empty D N A

The non deterministic example - analysis • the snapshot the algorithm takes is not necessarily a global state the system was in. • so, what does the snapshot represent then? • the answer is, that the snapshot is a reachable global state of the system. • in addition, if the events were to occur in a different order, the snapshot would be one of the global states reached. • this makes the snapshot consistent with it’s system.

Theorem • Given: • seq = (ei, i ≥ 0) a computation of some system • Si the global state of the system before event ei • Sj the initial global state of the system • Sk the global state of the system when the algorithm terminated (0 ≤ j ≤ k) • S* the global state the algorithm recorded (the snapshot) • then there is a computation of the system seq’ that: • for all i, i < j or i ≥ k, ei’ = ei • for all i, i ≤ j or i ≥ k, Si’=Si • the sub sequence (ei’, j ≤ i < k) is a permutation of the sub sequence (ei, j ≤ i < k) • there exists some t, j ≤ t ≤ k, such that S* = St’ seq: e0 ej-1 ek-1 e1 ej ek ei Sj Sk

Proof - definitions • pre-recording event – an event that occurred in processor p before p recorded it’s state. • post-recording event - an event that occurred in processor p after p recorded it’s state. • note: for event eiin seq : • if i < j then ei is a pre-recording event • if i ≥ k then ei is a post-recording event • note: for event eiin seq such that j < i < k • the event ei-1 can be a post-recording event and the event ei can be a pre-recording event if they occurred in different processors. • if they occurred in the same processor and ei-1 is a post-recording event then both must be post-recording events

Proof - details • lets denote ei-1=<p,a,b,M,c>, ei=<q,a’,b’,M’,c’> • lets assume: • ei-1 is a post-recording event • ei is a pre-recording event • can M=M’ and c’=c? that is, can q be receiving the message p sent? the answer is no. • ei-1 is a post-recording event which means that a marker was sent in c before M was sent. • the same marker was received by q before M reached it. • when q received the marker it recorded itself so if ei = <q,a’,b’,M,c> it can only be a post-recording event. in contradiction to the fact that ei is a pre-recording event

Proof – details cont’d • we saw that ei-1 and ei are independent of each other • this means we can swap their order in the computation seq • the new computation: ei-2,ei,ei-1 will end with the same global state as the original computation: ei-2,ei-1,ei Sk Si+1 Si Si-1 ej ei-1 ej+1 ei ek-1 ek swap ej ei ej+1 ei-1 ek-1 ek Si-1 S’i Sk Si+1

Proof – details cont’d • let seq’ be a computation were every post- recording event that occur right before a pre-recording event are swapped • we repeat the swapping until seq’ has all pre-recording events before post-recording events • note: • seq’ is a computation of the system • for all i, i < j or i ≥ k, ei’=ei • for all i, i ≤ j or i ≥ k, Si’=Si e0 ej ei-1 ej+1 ei ek-1 ek swap e0 e’j e’i-1 e’j+1 e’i e’k-1 ek

Proof – details cont’d • lets look at the global system state after the last pre-recording event and before the first post-recording event. we will denote this state St (j ≤ t ≤ k) • for some processor p let us assume the last state p was in before recording is a. (that means p recorded a as it’s state) • in the global state St we will see that p is in state a • in the snapshot S* we also see that the state of p is a (because p recorded a) • we conclude that the state of each processor in St is the same as in S*

Proof – details cont’d • for some channel c from p to q: • in St the messages in c are the ones p send before sending a marker in c (before p recorded itself) without the messages q received before recording itself • in the snapshot S* c contains all the messages q received in c after it recorded itself and before it received a marker in c • we conclude that the messages in c in the global state St and in the snapshot S* are the same.

Proof – conclusions • it is now clear that we have proven our Theorem: • there is a computation of the system seq’ that: • for all i, i < j or i ≥ k, ei’ = ei • for all i, i ≤ j or i ≥ k, Si’=Si • the sub sequence (ei’, j ≤ i < k) is a permutation of the sub sequence (ei, j ≤ i < k) • there exists some t, j ≤ t ≤ k, such that S* = St’

Example – permute a computation recall the non deterministic example: the computation we saw was: and the recorded global state was now, lets swap the events so all pre-recordings will precede post-recordings: the global state S’1 of this computation is exactly the snapshot of the original computation.

The algorithm - final conclusions • we saw that St=S*. from this we can see: • that the snapshot S* is reachable from Sj • that Sk is reachable from the snapshot S* • we saw S* could have been a global state of the computation if events were to occur in a different order • this means the snapshot is indeed valuable and informative when judging stability of a system

Referance • Chandy, K. M and Lamport, L. Distributed Snapshots: Determining Global States of Distributed Systems • Dijkstra, E. W. The distributed snapshot of K. M. Chandy and L. Lamport.

Snapshot Algorithm

Snapshot Algorithm

Presentation Transcript

CURRICULUM Snapshot

A Snapshot…

Snapshot

Database Snapshot

The P erformance of t he Chandy -Mishra Snapshot algorithm

“Unit Snapshot”

Word Snapshot

Snapshot

Snapshot

Snapshot

Snapshot

Lesson Snapshot

Distributed Snapshot

Elementary Snapshot

Distributed Snapshot

Distributed Snapshot

Instantaneous snapshot

STAAR Snapshot

Distributed Snapshot

Snapshot

Ho-Ramammorthy 2 phase snapshot algorithm PRESENTATION

Distributed Snapshot