Ch13 Checkpointing and Recovery

Ch13Checkpointing and Recovery

Outline Introduction What ? Why? Where? Problems in Rollback Incarnation numbers Taxonomy of solution techniques Uncoordinated checkpoint Coordinated checkpoint Synchronous Logging Asynchronous Logging Adaptive Logging

Checkpointing and Recovery Introduction During a computation, a node might fail and then be repaired After a failed processor has been repaired, how to take the system to a consistent global state? If every processor periodically : records its local state on stable storage, records messages received on stable storage Then One can take the system to a consistent global state by rolling back the system to a previously recorded global state Terminology checkpointing : record state in a stable storage log received messages : record received messages on a stable storage

Checkpointing and Recovery Recovery line A set C of local checkpoints forms a consistent state (also called recovery line) if the following conditions are satisfied: 1) there are no lost messages in C 2) there are no orphan messages in C 3) C contains exactly one checkpoint for each processor

Checkpointing and Recovery Problems in rollback Goal of rollback is to roll back the system to a consistent state Some precautions have to be taken for this to work properly For simplicity, we do not consider channel state for the rollback To see the problem, assume: 1) processors checkpoint from time to time 2) checkpoints are established independently without any coordination between themselves

Checkpointing and Recovery Problems in rollback To see the problem, assume: 1) processors checkpoint periodically 2) checkpoints are established independently without any coordination between themselves p3 p1 p2 m2 m3 c2 c3 The global state formed by c1,c2,c3 is inconsistent it contains: lost messages: m2, m3 orphan messages: m1 m1 c1

Problems in rollback : cascading rollbacks Checkpointing and Recovery p q r p p3 “p rolls back to p3” requires , because of message m1 that “r rolls back to r4” ... p1 q1 r1  m1 r r4 r2  m2 q2 q q4 p2  r3 m3 m4 p q3 p2 m5  m4 q r4 {p2,q3,r3} is a recovery line q3 q4 m3  m5 r m2 r3 p3 A rollback by a processor can cause an avalanche of rollbacks m1 How to avoid this ?

Problems in rollback : I/O stuttering Checkpointing and Recovery p q r Rolling back processor p to pi requires that the I/O event be re-executed: I/O stuttering How can we avoid this ? pi I/O Log inputs: avoid input stuttering Output commit: avoid output stuttering

Problems in rollback : messages duplication Checkpointing and Recovery q p q r(m) p pi pi Rollback(p) m r(m) After p recovers m r(m) After recovery, processor p sends m again. Processor q should recognize that message m is a duplicate message Processor p rolls back to pi No need for q to roll back

Incarnation numbers:handling duplicate messages Checkpointing and Recovery Every processor: maintains an incarnation number on a stable storage stores a guess of the incarnation number of every other processor On every recovery from failure or rollback, the incarnation number is incremented; Each message carries the incarnation number of the sender

Incarnation numbers:handling duplicate messages Checkpointing and Recovery Evolution of a processor is organized into periods. Incarnations numbers serve to identify these periods Recovery from failure Rollback 2 0 1 [ period 0 [ period 1 [ When processor p receives a message m from processor q, processor p behaves as follows: if m.incarnation < incarnation[q]: message m is a duplicate, discard it if = : deliver m if > : m belongs to an incarnation that p don’t know yet, so block the delivery of m until m.incarnation=incarnation[q]

Checkpointing and Recovery Choices to be made to implement a recovery scheme To log or not to log messages ? Log messages: + : increases flexibility at the recovery time - : expensive (space) processes must be deterministic (which is not often the case)

Checkpointing and Recovery Choices to be made to implement a recovery scheme To coordinated or not to coordinated recording state? Uncoordinated checkpoints Sufficientinformation(we’ll see later) must be kept for rollback + : keeps the cost of establishing checkpoints low - : the amount of rollback may be unbounded Coordinated checkpoints The set of checkpoints together form a recovery line + : limits the amount of rollback - : increases the cost of establishing checkpoints

Checkpointing and Recovery Uncoordinated checkpointing Assumptions 1.Processors asynchronously checkpoint from time to time 2. No coordination between processors for establishment of checkpoints 3. No log of messages Goal find a maximal recovery line (latest recovery line) i.e the one that happens after every other possible recovery line

Uncoordinated checkpointing Checkpoint interval algorithm (progressive rollback) Notations Ci,j : the jth checkpoint at processor pi Ii,j : the interval ] Ci,j ; Ci,j+1[, processing interval of pi between Ci,j and Ci,j+1 Definition Ik,l depends on Ii,j iff there is a message m sent in Ii,j and received in Ik,l Checkpointing and Recovery pi pk Ci,j m Ck,l Ci,j+1 Ck,l+1

Uncoordinated checkpointing Checkpoint interval algorithm (progressive rollback) Idea of the algorithm When a processor pi fails and then is repaired 1. Processor pi initiates recovery by restoring its last checkpoint, say Ci,j 2. Every processor pk in Ik,l such that Ik,l depends on Ii,j rolls back (but to which checkpoint ? We’ll see later) 3. This process continues recursively (transitively) until a recovery line is determined To support recovery, the information about interval dependence must be recorded (This is the sufficient information !) Checkpointing and Recovery

Uncoordinated checkpointing Interval dependence graph: to capture rollback requirements GI is a graph in which VI: vertices are checkpoint intervals that exist when recovery starts EI: directed edges such that 1). for every processor pi, (Ii,j , Ii,j+1) is in EI 2). If Ik,ldepends on Ii,j then (Ii,j , Ik,l) is added to EI Checkpointing and Recovery If then then If Ii,j Ii,j Ii,j Ii,j Ii,j+1 Ii,j+1 Ik,k+1 Ik,l in GI in GI

Uncoordinated checkpointing Intuition behind interval dependence graph: If processor pi rolls back to Ci,j and Ik,ldepends on Ii,j then processor pk must roll back to Ck,,l This, to avoid orphan messages Checkpointing and Recovery If and then Ii,j m pi pk Ci,j Ck,l Ik,l Because of m

Uncoordinated checkpointing Interval dependence graph illustrated: Checkpointing and Recovery p1 p3 p2 2,1 1,1 3,1 I1,1 I3,1 I2,1 m5 I3,2 I1,2 1,2 3,2 I2,2 2,2 m4 m3 I1,3 I3,3 m2 1,3 3,3 I2,3 2,3 m1 I1,4 I3,4 1,4 3,4 Message passing and checkpoiting Interval dependence graph

Uncoordinated checkpointing The checkpoint interval algorithm (progressive rollback) When a processor pi fails and then is repaired, then pi performs Step 1. Compute GI Step 2. Mark the node of GI corresponding to its last checkpoint interval; Let Ii,j be that node. Mark all the nodes of GI that are reachable from Ii,j Step 3. Define for each processor k, the “best checkpoint” of k w.r.t. recovery of pi to be : Ck,l such that l = min {j | Ik,j is marked} every processor rolls back to its “best checkpoint” Checkpointing and Recovery

Uncoordinated checkpointing The algorithm illustrated: assume that p2 fails and then is repaired Checkpointing and Recovery 2,1 1,1 3,1 Step 1. p2 computes GI 1,2 3,2 2,2 1,3 3,3 2,3 1,4 3,4 Interval dependence graph

Uncoordinated checkpointing The algorithm illustrated: assume that p2 fails and then is repaired Checkpointing and Recovery 2,1 Step 2. p2 marks all the nodes of GI reachable from its last checkpoint interval 1,1 3,1 1,2 3,2 2,2 1,3 3,3 Recall: for each processor k the “best checkpoint” of k w.r.t. recovery of p2 is Ck,l such that l = min {j | Ik,j is marked} 2,3 1,4 3,4 Interval dependence graph

Uncoordinated checkpointing The algorithm illustrated: assume that p2 fails and then is repaired Checkpointing and Recovery p1 p3 p2 Step 3. Each processor rolls back to its “best checkpoint” w.r.t. Recovery of p2 I1,1 I3,1 I2,1 m5 I3,2 I1,2 I2,2 m4 m3 I1,3 I3,3 m2 I2,3 Recall: for processor k the “best checkpoint” of k w.r.t. recovery of p2 is Ck,l such that l = min {j | Ik,j is marked} m1 I1,4 I3,4 The recovery line determined

Checkpointing and Recovery Uncoordinated checkpointing Some comments about the checkpoint interval algorithm Rollback can take the system to the initial state The algorithm presented is a centralized algorithm can be implemented on a recovery manager that directs all the participants to restart, each from its “best checkpoint” For a distributed version, recovery control messages are must be used to communicate parts of GI

Coordinated checkpointing Idea: Processors coordinate the checkpointing of their local states to ensure that the checkpoints taken by the different processors form a recovery line This avoid cascading rollback Method used: Similar to that used for computing a “global snapshot” However, there are some differences Checkpointing and Recovery

Coordinated checkpointing Subtleties: 1. Only processor states are recorded (save space) 2. Failures during checkpointing are handled 3. Store the minimum number of checkpoints (save space) 4. Lost messages are handled by the communication protocol (a consistent set of checkpoints may now contain lost messages) 5. No orphan messages in the computed set of checkpoints Checkpointing and Recovery

Coordinated checkpointing Subtleties (cont.): 6. Only a minimum number of processors must checkpoint idea: old checkpoints together with new checkpoints of some processors may form a “consistent set” of checkpoints Checkpointing and Recovery

Coordinated checkpointing Koo & Toueg 87 (the original algorithm): Uses a two-phase protocol to ensure that either all processors checkpoint or none do Two types of checkpoints are used for that “tentative checkpoint” : established when global state recording is ongoing “permanent checkpoint” : if the recorded state is consistent, tentative checkpoints become permanent checkpoints Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) Basic idea Phase 1 Initiator q: 1. an initiator processor q takes a tentative checkpoint; 2. q requests all other processors to take tentative checkpoints Non-initiator p: on receiving this request 1. p establish/ not establish the tentative checkpoint; 2. p sends its decision to the initiator; 3. p waits for the final decision from q (i.e. refrains from any communication with any other until the second phase is over) Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) Basic idea (cont.) Phase 2 : Initiator q: 1. Processor q collects decisions from all other processors 2. If all other processors have taken tentative checkpoints then q makes its tentative checkpoint permanent; else q undo its tentative checkpoint; 3. q requests all others to perform the same final decision Non-initiator p: on receiving this final decision processor p executes the order; Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) The Basic idea ensures that there are no orphan messages Why? Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) The Basic idea ensures that there are no orphan messages Why? Answer: no communication is allowed until the second phase is over Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) It is not necessary that all processors record their state during checkpointing Why ? Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) It is not necessary that all processors record their state during checkpointing Why ? Checkpointing and Recovery p2 p1 p3 p1 initiates checkpointing by establishing c1,1 then p1 contacts p2, p3 sending red messages assume that everything went fine and p2, p3 establish c2,2 and c3,2 respectively as new checkpoints {c1,2 , c2,2 , c3,2} form a consistent set of checkpoints However, {c1,2 , c2,1 , c3,2}also form a consistent set of checkpoints (i.e. no orphan messages) Hence, processor p2 need not take a new checkpoint C1,1 C2,1 C3,1 C1,2 C2,2 C3,2

Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) Ensuring a minimum number of checkpoints: Every processor assigns monotonically increasing sequence numbers to each message it sends Each processor p uses: p.last_rec[1..M] an array of sequence numbers p.last_rec[i] = sequence number of the last message that processor p received from processor pi since p’s last checkpoint p.first_sent[1..M] an array of sequence numbers p.first_sent[i] = sequence number of the first message that processor p sent to processor pi since p’s last checkpoint

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) Ensuring a minimum number of checkpoints: When an initiator processor q requests a processor p to take a tentative checkpoint, processor q appends q.last_rec[p] to its request On receiving this request from q, processor p takes the tentative checkpoint only if (p.first_sent[q]  q.last_rec[p]) Checkpointing and Recovery p q Last checkpoint of p Last checkpoint of q p takes a new checkpoint only in this case  avoid orphan messages Current checkpoint of q

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) Ensuring a minimum number of checkpoints (cont.) Only processors that have sent messages to the initiator processor q since q’s last checkpoint need to consider the establishment of a new checkpoint requested by q  an initiator processor q should send requests only to those processors p such that : Checkpointing and Recovery p q Last checkpoint of q Current checkpoint of q

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) Ensuring a minimum number of checkpoints (cont.) Every processor q maintains: q.checkpoint_cohort : a set that contains those processors from which q has received some messages since q’s last chekpoint i.e. q.checkpoint_cohort stores processors p such that: Checkpointing and Recovery p q Last checkpoint of q Current checkpoint of q

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) The algorithm Phase 1 Initiator processor q: 1. Take tentative checkpoint; 2. for every processor p in q.checkpoint_cohort do send (Request_tentative_chkp; q.last_rec[p]) to p; Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) The algorithm Phase 1: Non-initiator processor p: On receiving “Request_tentative_chkp; q.last_rec[p]” from q if (ready to perform tentative checkpoint) and (p.first_sent[q]  q.last_rec[p]) then take tentative checkpoint; for every processor r in p.checkpoint_cohort do send (Request_tentative_chkp; p.last_rec[r]) to r; p.replies := empty; for every processor r in p.checkpoint_cohort do waituntil r sends “OK” or “KO” , Timeout=T; on “OK” : add r to p.replies; /* set of replies */ If p.replies  p.checkpoint_cohort then send “KO” to q else send “OK” to q Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) The algorithm Phase 2 Initiator processor q: 1. q.replies := empty; 2. for every processor p in q.checkpoint_cohort do waituntil p sends “OK” or “KO” , Timeout=T; on “OK” : add p to q.replies; /* set of replies */ if q.replies  q.checkpoint_cohort then undo tentative; send “undo tentative checkpoint” to every processor in q.checkpoint_cohort else permanent := tentative; send “make tentative checkpoint permanent” to every processor in q.checkpoint_cohort Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) The algorithm Phase 2 Non-initiator processor p: waituntil q sends “undo …” or “make … permanent”; timeout = T on “undo …” do undo tentative checkpoint end on “make … permanent” do checkpoint : =tentative_checkpoint end if no timeout then m := message received; for every processor r in p.checkpoint_cohort do send m to r; Checkpointing and Recovery

Coordinated checkpointing: Koo & Toueg 87 (the original algorithm) Handling failures idea: Failures are detected by timeouts; On recovery, if the recovering processor was the initiator, it undoes its tentative checkpoint and sends this decision to the other processors else the recovered processor consults the initiator oe some other processor to find the final decision Checkpointing and Recovery

Logging Idea: Processors record incoming messages Purpose: avoid need of “resending” reduce the amount of rollback (idea of virtual checkpoint) Checkpointing and Recovery Log messages + flexibility - expensive Virtual checkpoint

Synchronous Logging Idea Each message must be logged before it can be delivered During recovery, logged messages are replayed until the recovering processor is up to date (guarantee of replay after all sends that can cause subsequent rollback) Problem : expensive Checkpointing and Recovery

Asynchronous Logging Idea Each message must be logged but not necessarily before it can be delivered Messages can be first saved in main memory Exploit idle period to log messages several messages can be packed together then logged simultaneously (efficient used of I/O devices) Problem some messages may be lost  not always possible to replay Checkpointing and Recovery

Ch13 Checkpointing and Recovery