170 likes | 480 Views
Distributed Snapshots:. Non-blocking checkpoint coordination protocol. Next: Uncoordinated Chkpnt. Uncoordinated. Processes take chkpnt independently Domino Effect!. Next: Coordinated Blocking Chkpnt. Coordinated Blocking. Processes are coordinated to form a consistent global state, and ….
E N D
Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt
Uncoordinated • Processes take chkpntindependently • Domino Effect! Next: Coordinated Blocking Chkpnt
Coordinated Blocking • Processes are coordinated to form a consistent global state, and … * okay, channels flushed Ready! Go! initiator * p1 * p2 * p3 Next: Coordinated Blocking Chkpnt (cont’)
Coordinated Blocking (cont’) • Advantage • Always consistent • No Domino Effect • Less storage overhead • Disadvantage • Large latency to chkpnt! Next: Coordinated Non-blocking Chkpnt
Coordinated Non-blocking • Processes are coordinated, but … • Do we really need to block …? ! Leslie Lamport K. Mani Chandy ! Next: Global-state Recording Algorithm
Global-state Recording Alg. “Distributed snapshots: determining global states of distributed systems”, K. Mani Chandy and Leslie Lamport • Step 1: process states • Step 2: channel states • Step 3: end of the algorithm Next: Model of Distributed System
c1 p q r c2 c3 c4 Model of Distributed System • Processes • Channels: directed, FIFO, error-free Next: Step 1, process states
Step 1: process states • Initiator: • Save its local state • Send marker tokens on all outgoing edges • All other processes: • On receiving the first marker on any incoming edges, • Save state, and propagate markers on all outgoing edges • Resume execution. • Further markers will be eaten up. Next: Example
c1 initiator c2 c3 c4 r p q marker checkpoint Example p x x q x x x r Next: Proof
x x x x x p q • Proof Let us assume that a message m exists, and it makes our cut inconsistent. p m q Next: Proof (cont’)
x2 x x1 x x p q [Incomplete page] • Proof(cont’) p m x1 • x1 is the 1st marker • for process q q x2 p m (2) x1 is not the 1st marker for process q x1 q x2 Contradict the assumption. Next: Step 2, channel states
Step 2: channel states p In-flight messages q • Sent along the channel before the sender’s chkpnt • Received along the channel after the receiver’s chkpnt Next: Example
Example (2) p has just saved its state (1) p is receiving messages r r s s q q x x 7 7 x x 8 8 5 5 x 3 6 6 2 1 4 4 p p x x u u t t Next: Example (cont’)
Example(cont’) p’s chkpnt triggered by a marker from q r s x q x 7 1 2 3 5 4 6 7 8 p x 8 5 x x 3 6 q 2 1 4 x x x p r x u s t x Next: Algorithm (revised)
Algorithm (revised) • Initiator: • Save its local state • Send marker tokens on all outgoing edges • All other processes: • On receiving the first marker on any incoming edges, • Save state, and propagate markers on all outgoing edges • Resume execution, but also save incoming messages until a marker arrives through the channel • Guarantees a consistent global state! Next: Step 3, end of the algorithm
initiator r p q Step 3: end of the algorithm • Did every process save its state and in-flight messages? • direct channel to the initiator? • spanning tree? • General solution? Next: References
References “Distributed snapshots: determining global States of distributed systems”, K. Mani Chandy and Leslie Lamport