270 likes | 547 Views
Message Logging Pessimistic & Optimistic. CS717 Lecture 10/16/01-10/18/01 Kamen Yotov kyotov@cs.cornell.edu. Intruduction. Context & Applications Check-pointing Message Logging Pessimistic (failure-free mode suffers) Optimistic (good for failure-free mode)
E N D
Message LoggingPessimistic & Optimistic CS717 Lecture 10/16/01-10/18/01 Kamen Yotov kyotov@cs.cornell.edu
Intruduction • Context & Applications • Check-pointing • Message Logging • Pessimistic (failure-free mode suffers) • Optimistic (good for failure-free mode) • Causal (to be discussed in next lectures...) • Main problems • Consistency • Orphans
Fault Tolerance “Why”s • Flow of events • Check-point • Log messages • Crash • Restore • Replay
Common Assumptions • Fail-stop model • Failure eventually detectable by all • Channels • Asynchronous • Reliable • FIFO • Unbounded message delivery • Failures • Transiently dropping • No duplication and/or corruption • Stable storage • Spare processing capacity
Common goals • Application independence • Application transparency • Simple • Independent evolution • Handles preexisting programs • High throughput • Failure-free model with little overhead • Maximum fault-tolerance • Any number of failures
Formal Terminology • Delivery (as opposed to receipt) • Non-faulty processes eventually deliver all messages that they have received • Receive sequence number • If p delivers m and m.rsn=l then m is the lth message p delivers • Run • Sequence of system states • Asynchronous • Only one process changes state at once
Formal Terminology (cont.) • Properties: Logical expressions over runs • □ - Always • ◊ - Eventually • Message determinant • #m = <m.src, m.ssn, m.dest, m.rsn, m.data> • m.data and m.dest not essential • Logging determinants vs. actual messages • Other notation • N – set of all processes • C – set of failed processes • Log(m) – set of processes possessing a copy of #m • Depend(m) – set of processes that depend on m
Orphan Properties • Before failure, by definition #mLog(m) • #m lost if Log(m)C • stable(m) if #m cannot be lost • p orphan of C if • p did not fail • pDepend(m) • #m is lost
Performance Metrics • Number of forced roll-backs • Time spend on blocking • Number of messages • Size of messages
Got to the real-world stuff! • No additional messages • Any number of failures (including total) • No assumptions about the logging protocol • Pessimistic doesn’t require that generality
The ModelProcess states • Process states • State interval • Instantiates a new one on each message received • State interval index (auto increment) I01 I11 I32 p1 p2 p3 I03 I13 I23 I33 I43 I53
The ModelProcess states (cont.) • Dependencies between process states (pi depends on pj) • Maximum index of any interval of pj, on which pi depends • Inside a process each interval depends on the previous one • Dependence vector • di = <*> = < 1, 2, 3, 4,…, n>, k = , 0, 1, … I01 I11 I32 p1 p2 p3 I03 I13 I23 I33 I43 I53
The ModelSystem states • Process state – dependence vector • di = <*> = < 1, 2, 3, 4,…, n>, k = , 0, 1, … • System state – dependence matrix • nn • Row i – process state for pi • Diagonal – current state intervals
The ModelSystem states (cont.) • S – set of all system states • A=[**]S and B=[**]S • A B i=1..n: ii ii • Partial order different than Lamport’s • Orders system states vs. events • Only events are state intervals • Lattice • A B = [**] ik = ii ii ? ik : ik • A B = [**] ik = ii ii ? ik : ik
The ModelConsistent System states • Consistent state • All received messages • Sent in the current state of the sender • Can be deterministically sent in the future • Messages not yet received are not a problem • Definition: D=[**]S, i, k=1..n: ik kk • A process cannot depend on the state interval of another process, that has not been reached yet • C = { D S | D is consistent } • C is a sub-lattice of S – proof straightforward!
The ModelLogging and Stability • logged(i,) • Message that started state interval of process i has been logged on stable storage • checkpoint(i,) • Exists a check-point that contains the state of process i on stable storage • checkpoint(i,0) is implicit • Effective check-point for on i is checkpoint(i,), , is maximal • stable(i,) : < [logged(i,)]
The ModelRecoverable System states • Recoverable system state • System state is consistent • All current process states are stable • D=[**]S • recoverable(D) D C && i : stable(i, ii) • R = { D S | recoverable(D) } • R is a sub-lattice of S – proof straightforward! • Theorem: A single maximum recoverable state exists! • Proof • R S; • A B R if A, B R A, B A B • Therefore maximum is D R D, obviously unique!
The ModelRecoverable System states (cont.) • Current recovery state • The Maximum Recover State at any time • Never decreases • D=[**], No : ( i : ii ) is ever rolled back • Proof: • D will always remain consistent • iiwill always remain stable • Since R is a lattice, any new state formed after D will be greater than D • In any new current recovery: • ii state interval index for each process • Therefore, not state interval ii for each i will ever need to be rolled back!
The ModelWrapup… • Corollary 1: If all messages received are eventually logged no domino effect occurs • If D=[**] is the current recovery state • Corollary 2: Any messages sent by process i from state ii may be committed • With i being the effective checkpoint of ii • Corollary 3: All previous checkpoints of process i may be discarded • Corollary 4: All messages that begin state intervals prior to i may be discarded
The AlgorithmOverview • Keep a current recovery state • On each new interval for some process k becoming stable • Try to improve the current recovery state, such that: • State of process k advances to • Add more state intervals from other processes to maintain consistency • Succeed if all such included intervals are stable
The AlgorithmBasic implementation • Notation • D=[**]– the current recovery state • – state interval of process k becoming stable • dk = <*> = < 1, 2, 3, 4,…, n>, j = , 0, 1, … – state of process k (dependence vector) • Algorithm • if ( >kk) { i : ki i // update row of D while ( i,j : ij >jj ) if ( ij : stable()) // - an interval for j i : ji i // update row of D with dj for else fail}
The AlgorithmSome details • The chosen should be the minimum stable state interval: ij • The comparisons ij >jj can be made in any order without affecting the final result • When state interval of process k becomes stable, the algorithm finds some recoverable D with kk = • No stable process state interval that was not suitable should be checked again before advancing the current recovery state • Corollary: When the recovery state advances from some D to D’, the rejected ’s above that need to be rechecked are those with direct dependency on some on any process i such that ii < < ii’
The AlgorithmProof of Correctness • The algorithm presented always finds the current recovery state of the system • Only finds recoverable system states • Any such state found is greater • Following the observations stated before, all possible new states are considered • Therefore, the correct one is always found!
The AlgorithmOptimizations & Implementation • Optimization considerations • Keeping work list of rows to update D • Keep only the one with max index • Keeping only the diagonal of D • Implementation • Provided in the paper • Follows everything said till now • Takes advantage of some specifics
Conclusions • General Model and Algorithm • Work for both pessimistic and optimistic protocols • Does not need the generality for the pessimistic case • Optimistic logging is desirable from performance standpoint in low failure environments • Unifies existing approaches to fault tolerance • Check-pointing • Message Logging • Results • Existence of unique maximum recoverable state • Never decreases (progress is being made) • Domino effect cannot occur
Future work list… • Address non-determinism • Switch between • check-pointing for the non-deterministic part • Check-pointing + message logging elsewhere • Output-driven optimistic message logging and check-pointing • Pay attention to communication of the results • Application specific knowledge