1 / 13

Uncoordinated Checkpointing

Uncoordinated Checkpointing. The Global State Recording Algorithm. channel. node. The Model. Node properties No shared memory No global clock. Channel properties: FIFO loss free non­duplicating. C1:transfer $50. C1:empty. C1:empty. $500. $450. $450. $200. $200. $250.

Download Presentation

Uncoordinated Checkpointing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Uncoordinated Checkpointing The Global State Recording Algorithm

  2. channel node The Model Node properties • No shared memory • No global clock Channel properties: • FIFO • loss free • non­duplicating CS 5204 – Fall, 2008

  3. C1:transfer $50 C1:empty C1:empty $500 $450 $450 $200 $200 $250 C2:empty C2:empty C2:empty The Problem CS 5204 – Fall, 2008

  4. Distributed Snapshot (Global State Recording) • Motivation for recording a “consistent” state of the global computation: • checkpointing for fault tolerance (rollback, recovery) • testing and debugging • monitoring and auditing • Method: detecting stable properties in a distributed system via snapshots. A property is “stable” if, once it holds in a state, it holds in all subsequent states. • termination • deadlock • garbage collection CS 5204 – Fall, 2008

  5. Definitions Local State and Actions: local state: LSi message send: send(mij ) message receive: rec(mij ) time: time(x) send(mij )  LSi iff time(send(mij )) < time(LSi ) rec(mij )  LSj iff time(rec(mij )) < time(LSj ) Predicates: transit(LSi , LSj ) = {mij | send(mij )  LSi !( rec(mij )  LSj ) ) } inconsistent(LSi , LSj ) = {mij | !(send(mij )  LSi )  rec(mij )  LSj ) } Consistent Global State:  i,  j : 1 <= i, j <= n :: inconsistent( LSi , LSj ) =  CS 5204 – Fall, 2008

  6. Global­State­Recording Algorithm Marker­Sending Rule for a Process p: for (each channel c, incident on, and directed away from p) { p sends one marker along c after p records its state and before p sends further messages along c; } Marker­Receiving Rule for a Process q: if (q has not recorded its state) then { q records its state; q records the state of c as the empty sequence; } else { q records the state of c as the sequence of message received along c after q's state was recorded and before q received the marker along c. } CS 5204 – Fall, 2008

  7. before receiving the marker, q changes its state and sends message D. empty empty M M S1 S2 S3 S0 q q q q p p p p empty empty M’ D q receives the marker and records its state (D) and the incoming channel as empty; q send marker M' on its outgoing channel. state A state A state A state B state D state D state C state C on receiving the marker, p records the channel as having message D empty recorded state q p D state A state D p records its state (A) and sends marker M on channel CS 5204 – Fall, 2008

  8. c1 500 500 p q c2 c4 c3 r = Marker M 500 Snapshot/State Recording Example CS 5204 – Fall, 2008

  9. M 10 c1 470 490 p q c2 20 c4 c3 10 r 500 Snapshot/State Recording Example (Step 1) CS 5204 – Fall, 2008

  10. c1 480 490 p q c2 M c4 20 M c3 10 25 r 475 Snapshot/State Recording Example (Step 2) CS 5204 – Fall, 2008

  11. 20 c1 480 470 p q c2 M c4 20 c3 25 r M 485 Snapshot/State Recording Example (Step 3) CS 5204 – Fall, 2008

  12. c1 500 490 p q c2 c4 c3 25 r M 485 Snapshot/State Recording Example (Step 4) CS 5204 – Fall, 2008

  13. c1 500 515 p q c2 c4 c3 r 485 Snapshot/State Recording Example (Step 5) CS 5204 – Fall, 2008

More Related