140 likes | 397 Views
Global State Recording. definitions global state recording FIFO Chandy-Lamport’s algorithm collecting global state incremental snapshot non-FIFO Lai-Yang two color algorithm Mittern’s vector clocks algorithm consistent global snapshots causality and zigzag paths
E N D
Global State Recording • definitions • global state recording • FIFO • Chandy-Lamport’s algorithm • collecting global state • incremental snapshot • non-FIFO • Lai-Yang two color algorithm • Mittern’s vector clocks algorithm • consistent global snapshots • causality and zigzag paths • rollback dependency graph
Local State • local stateLSi of a site (process) Si is an assignment of values to variables of Si • sending send(mij) and receiving rec(mij) of message mij from Si to Sj may influence LSi • we denote • time(send(mij) or rec(mij)) the sequence number of the state in the compuation after which send/receive occurs (note the difference with Singhal) • time(LSi) the state at which Si was recorded • to aid the reasoning we consider the messages sent/received by a process as belonging to local state we define • that is • the message is in transit if it was sent but not received • the message is inconsistent if it was received but never sent
Global State • global state is a collection of local states of all processesand set of messages in the channels • notice Singhal does notuse messages in his def. – ours is more precise global state is consistent if it does not have any inconsistent messages,that is: global state is transitless if there are no messages in transition,that is: note that a consistent state is not necessarily transitless and v.v. what are the global states on the picture above?
Chandy-Lamport’s Global State Recording Algorithm • works on arbitrary topology system with FIFO channels and arbitrary algorithm whose snapshot is taken (basic algorithm) • does not interfere with the operation of basic algorithm (does not delay, reorder or drop basic messages) • one process initiates recording by sending controlmessages (markers)multiple pro-cesses can also initiate • can C-L record an inconsistent state?
Global State Recorded by Chandy-Lamport’s Algorithm • does C-L record a (global) state that occurs in the computations? • not necessarily. • however, C-L records a state in a computation that is equivalent to the original computation. • moreover this equivalent computation shares with the original computation • a prefix up to start of snapshot • a suffix after the snapshot • the recorded state is between these two states • can C-L record a state where some P have messages in every channel? if yes which one? • can several independent snapshots run in parallel?
Collecting Global State • based on spanning tree constructed on the fly • sender of the first marker to arrive at a process is its parent • each marker carries the sender’s parent • by receiving marker process learns if it has children • if process is a leaf, after finishing state recording, it sends its state to its parent • each process waits for its children’s states, appends its own and forwards all info to its parent
Non-FIFO: Lai-Yang Algorithm • non-FIFO channel: is a set (rather than a queue) of messages, any message in the set can be received • messages can overtake one another • fair message receipt is assumed – eventually a sent message is received • two colors for processes and basic messages – white and red, no explicit markers • all processes start as white, when process sends a basic message it attaches its color • when process receives a differently colored message, it itself changes color • while white (red), process records all messages sent/received, after changing colors, the process sends message history to the initiator; based on sent/received histories, initiator calculates messages in transit • if only the number of messages in transit needed – may maintain counters in stead of histories
State Recording Using Causal Message Delivery • initiator broadcast token to all processes • each process records state and sends it to the initiator • processes do not send markers or coordinate local state recording • due to causal ordering of messages • if for Pi: rec(tokeni) send(mij) then send(tokenj) send(mij) thereforerec(tokenj) rec(mij) hence state recording at Pjhappens before mij receipt • channel state recording (Archaya-Badrinath) • append sequence numbers to all messages, • at each process record highest sent/received SN • together with local state send sent/received records for initiator to determine messages in transition
Consistent Global Snapshot • processes periodically asynchronously record local states (local checkpoints) • a global snapshot is a collection of local checkpoints • needed in distributed failure recovery, distributed event monitoring, debugging, etc. • global snapshot is consistent if no two checkpoints are causally related • even though checkpoints themselves are not causally related, they may not be a part of a global snapshot • ex: C11 and C32 are concurrent, yet there is no global snapshot that contains both of them • the objective in global snapshot recording is to select (out of available) the set of concurrent checkpoints. • note that unlike global state recording, the alg. does not have control over thesnapshottakingtime
Zigzag Path • checkpoint interval – part of computation between two successive checkpoints at the same process • zigzag path exists between checkpoints Cxi and Cyj if there exists a sequence of messages m1, …mn such that • m1 sent by Px after Cxi • mk received by Pz, mk+1 is sent by Pz in the same checkpoint interval • mn received by Py before Cyj • causal path - same as zigzag path but the messages are causally related • zigzag cycle is a zigzag path to the process itself zigzag path causal path
Zigzag Path • checkpoint interval – part of computation between two successive checkpoints at the same process • zigzag path exists between checkpoints Cxi and Cyj if there exists a sequence of messages m1, …mn such that • m1 sent by Px after Cxi • mk received by Pz then mk+1 is sent by Pz in the same checkpoint interval • mn received by Py before Cyj • causal path - same as zigzag path but the messages are causally related • zigzag cycle is a zigzag path to the process itself zigzag cycle zigzag path
Sufficient Condition for Consistent Snapshot • a consistent snapshot can be formed to include a set S of checkpoints if and only if no zigzag path exists between any two checkpoints in S [Netzer and Xu] • snapshot line – a line drawn through a set of checkpoints • due to the existence of a zigzag path, a snapshot line always crosses a message making two checkpoints causally related and resultant snapshot inconsistent • constructing consistent snapshot requires choosing checkpoints without zigzag path zigzag path
Runtime Consistent Snapshot Construction Using R-Graph • definition of rollback-dependency graph (R-graph) [Wang] • basic message carries its checkpoint interval number • there is a zigzag path between two checkpoints if there is a path in R-graph examplecomputation correspondingR-graph