1 / 23

Distributed Snapshots: Determining Global States of Distributed Systems

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport. Outline. What this paper is about. Stable property of a system. Distributed System model Definitions. Token Conservation system example. Non-Deterministic Computation example.

kimo
Download Presentation

Distributed Snapshots: Determining Global States of Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

  2. Outline • What this paper is about. • Stable property of a system. • Distributed System model • Definitions. • Token Conservation system example. • Non-Deterministic Computation example. • Global State Determination Algorithm • Requirements of Consistent Global State. • Termination of the algorithm • Properties of the recorded global state. (theorem + example) • Stability Detection Algorithm

  3. Goal/Theme • Algorithms by which a process in a distributed system can determine global state of the system during a computation. • Processes need to cooperate to record state. • Individual processes do not share clocks/memory. • Global state =  (Process state, channel state) • Algorithm must run concurrently with, but not alter underlying computation.

  4. Stable State • Why does this come in the paper ? • Global State Detection Algorithm can help solve stability detection. • Several distributed systems problems need to determine stable property. • y(S) : predicate function defined on the global states of the distributed system D. • y is a stable property if y(S) => y(S’) for all global states S’ in D reachable from S in D. • E.g. “computation terminated”, “system deadlocked”.

  5. Distributed System Model • Finite set of processes & channels[directed graph]. • Channels: infinite buffers, error free, ordered delivery of messages, messages have finite delay. • Channel state: sequence of messages sent but exclude those received. • Process defined by: initial state, set of events, set of states. • Events e = <p, s, s’, M, c>. They are atomic and can change the state of p and at most one channel. M and c = null if e does not change state of any channel.

  6. Contd.

  7. Contd. • Global state (initial state): processes (initial state), channels (empty sequence). • Next (S, e) • let seq = (ei : 0 <= i <= n) be a sequence of events in component processes of a distributed system. • seq is legal iff: • system starts in S0 and Si+1 = next (Si, ei) 0 <= i <= n.

  8. Example 2.1

  9. Contd.

  10. Example 2.2

  11. Contd.

  12. Contd. • M : marker. M’ : message. • In example 2.1 only 1 event was possible… not so the case here. Different permutations of sequence of events will lead to different global states… non-determinism. • Non-Determinism : for example the events “p sends M” and q sends M’” may occur in the initial state and the next states after these events are different.

  13. Algorithm Steps: • Each process records its own state. • 2 processes that a channel is incident on cooperate in recording channel state. • All process and channel states will not be recorded at the exact same instant… no global clock. • Recorded global state must be “consistent”. • Should run concurrently with underlying computation… sending messages…require processes to carry out computation; however algorithm cannot affect underlying computation.

  14. Example 3.1 • In-p. record P's state. In-c. record q, c and c’ state. • 2 tokens ! • n < n’. (n: #msg's sent c before recording P's state, n’: #msg’s sent c before recording C’s state). • In-p. record c’s state. In-c. record q, p and c’ state. • 0 tokens ! • n > n’. • Hence need : n = n’ for consistent global state.

  15. Contd. • Likewise: need m = m’ where m: #MSG’s received along c before recording Q’s state. m’ : #MSG’s received along c before recording C's state. • The state of a channel c that is recorded must be the sequence of messages sent along the channel before the sender’s state is recorded, excluding the sequence of messages received along the channel before the receiver’s state is recorded. • Marker has no effect on underlying computation.

  16. Global State Algorithm Outline • Marker sending rule : • For each channel c, incident on/directed away from p: • p sends one marker along c after p records its state and before p sends any further messages along c. • Marker Receiving rule: • On receiving a marker along channel c: if q has not recorded its state then begin q records it state; q records the state c as the empty sequence. end else q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c.

  17. Algorithm Termination • To ensure termination in finite time: • L1: no marker remains forever in an incident input channel. • L2: state recording takes finite time. • Can prove that the algorithm must terminate in finite time given these conditions.

  18. Properties of Global State • Recorded global state may not be the same as any actual state. • But is equivalent (reachable from) and is consistent. • See next theorem.

  19. Contd. • let seq = (ei : 0 <= i <= n) be a distributed computation, and let Si be the global state immediately before the event ei in seq. • Let the algorithm be initiated in global state Sl and let it terminate in global state S. • The recorded global state S* may be different from all global states Sk, l <= k <=  • We show that: • S* is reachable from Sl, and • S is reachable from S*.

  20. Contd. • Specifically, we show that there exists a computation seq’ where • seq’ is a permutation of seq, such that Sl, S* and S occur as global states in seq’. • Sl = S* or Sl occurs earlier than S*. • S = S* or S* occurs earlier than S in seq’.

  21. Theorem 1. • There exists a computation seq’=(ei: 0<=i) where • For all i, where i < l or i >  : ei’ = ei • the subsequence (ei’: l<=i<) is a permutation of the subsequence (ei: l<=i<) • for all i where i <= l or i >  : Si’ = Si • there exists some k, l <= k<=  such that S* = Sk’

  22. Example 4.1 • To show how seq’ can be derived from seq. • Example 2.2 fig 7. • e0: p sends M, changes state to B (post-recording event). • e1: q sends M’, changes state to D (pre-recording event). • e2: p gets M’, changes state to A (post-recording event). • We can interchange interchange e0 and e1 to form the subsequence seq’. (which corresponds to the global state we recorded in fig 8… see after e0’).

  23. Stability Detection • Stability Detection Algorithm: Input: A stable property y. Output: A Boolean value definite with the property (y(Si) -> definite) and (definite -> y(So)) • Solution to stability detection problem: begin record a global state S*; definite := y(S*) end. • Algorithm correctness comes from properties discussed earlier.

More Related