350 likes | 491 Views
Ordering and Consistent Cuts. Presented By Biswanath Panda. Introduction. Ordering and global state detection in a “distributed system” Fundamental Questions What is a distributed system? What is a distributed computation? How can we represent a distributed system?
E N D
Ordering and Consistent Cuts Presented By Biswanath Panda
Introduction • Ordering and global state detection in a “distributed system” • Fundamental Questions • What is a distributed system? • What is a distributed computation? • How can we represent a distributed system? • Why are today’s papers so important?
A distributed system is …. • A collection of sequential processes p1, p2, p3…..pn • Network capable of implementing communication channels between pairs of processes for message exchange • Channels are reliable but may deliver messages out of order • Every process can communicate with every other process(may not be directly) • There is no reasoning based on global clocks • All kinds of synchronization must be done by message passing
Distributed Computation • A distributed computation is a single execution of a distributed program by a collection of processes. Each sequential process generates a sequence of events that are either internal events, or communication events • The local history of process piduring a computation is a (possibly infinite) sequence of events hi =ei1, ei2….... • A partial local history of a process is a prefix of the local history hin =ei1 , ei2 … ein • The global history of a computation is the set H = Ui=1n hi
So what does this global history as defined tell us? • It is just the collection of events that have occurred in the system • It does not give us any idea about the relative times between the events • As there is no notion of global time, events can only be ordered based on a notion of cause and effect • So lets formalize this idea
Happened Before Relation (→) • If a and b are events in the same process then a→ b • If a is the sending of a message m by a process and b is the corresponding receive event then a→ b • Finally if a→ bb→ c then a→ c • If a→ b and b → a then a and b are concurrent • → defines a partial order on the set H
Space Time Diagram • Graphical representation of a distributed system • If there is a path between two events then they are related • Else they are concurrent
Is this notion of ordering really important? • Some idea of ordering of events is fundamental to reason about how a system works • Global State Detection is a fundamental problem in distributed computing • Enables detecting stable properties of a system • How do we get a snapshot of the system when there is no notion of global time or shared memory • How do we ensure that that the state collected is consistent • Use this problem to illustrate the importance of ordering • This will also give us the notion of what is a consistent global state
Global States and Cuts • Global State is a n-tuple of local states one for each process • Cut is a subset of the global history that contains an initial prefix of each local state • Therefore every cut is a natural global state • Intuitively a cut partitions the space time diagram along the time axis • A Cut is identified by the last event of each process that is part of the cut
Introduction to consistency • Consider this solution for the common problem of deadlock detection • System has 3 processes p1, p2, p3 • An external process p0 sends a message to each process (Active Monitoring) • Each process on getting this message reports its local state • Note that this global state thus collected at p0 is a cut • p0 uses this information to create a wait for graph
Consider the space time diagram below and the cut C2 1 3 2 Cycle formed
So what went wrong? • p0 detected a cycle when there was no deadlock • State recorded contained a message received by p3 which p1 never sent • The system could never be in such a state and hence the state p0 saw was inconsistent • So we need to make sure that application see consistent states
So what is a consistent global state? • A cut C is consistent if for all events e and e’ • Intuitively if an event is part of a cut then all events that happened before it must also be part of the cut • A consistent cut defines a consistent global state • Notion of ordering is needed after all !!
Passive Deadlock Detection • Let’s change our approach to deadlock detection • p0 now monitors the system passively • Each process sends p0 a message when an event occurs • What global state does p0 now see • Basically hell breaks lose
FIFO Channels • Communication channels need not preserve message order • Therefore p0 can construct any permutation of events as a global state • Some of these may not even be valid (events of the same process may not be in order) • Implement FIFO channels using sequence numbers • Now we know that we p0 sees constructs valid runs • But the issue of consistency still remains
Ok let’s now fix consistency • Assume a global real-time clock and bound of δ on the message delay • Don’t panic we shall get rid of this assumption soon • RC(e): Time when event e occurs • Each process reports to p0 the global timestamp along with the event • Delivery Rule at p0: At time t, deliver all received messages upto t- δ in increasing timestamp order • So do we have a consistent state now?
Clock Condition • Yes we do!! • e is observed before e’ iff RC(e) < RC(e’) • Recall our definition of consistency • Therefore state is consistent iff • This is the clock condition • For timestamps from a global clock this is obviously true • Can we satisfy it for asynchronous systems?
Logical Clocks • Turns out that the clock condition can be satisfied in asynchronous systems as well • → is defined such that Clock Condition holds if • A and b are events of the same process and a comes before b then RC(a)<RC(b) • If a is the send of an event and b is corrsponding receive then RC(a)<RC(b)
Lamport’s Clocks • Local variable LC in every process • LC: Kind of a logical clock • Simple counter that assigns timestamps to events • Every send event is time stamped • LC modification rules LC(ei) = LC + 1 if ei is an internal event or send max{LC,TS(m)} + 1 if ei is receive(m)
Example of Logical Clocks 1 2 4 p1 5 p2 1 p3 1 2 3 4
Observations on Lamports Clocks • Lamport says • a → b then C(a) < C(b) • However • C(a) < C(b) then a → b ?? • Solution: Vector Clocks • Clock (C) is a vector of length n • C[i] : Own logical time • C[j] : Best guess about j’s logical time
Vector Clocks Example 1,0,0 2,0,0 3,4,1 2,3,1 2,4,1 2,2,0 0,1,0 0,0,1
Let’s formalise the idea • C[i] is incremented between successive local events • On receiving message timestamped message m • Can be shown that both sides of relation holds
So are Lamport clocks useful only for finding global state? • Definitely not!!! • Mutual Exclusion using Lamport clocks • Only one process can use resource at a time • Requests are granted in the order in which they are made • If every process releases the resource then every request is eventually granted • Assumptions • FIFO reliable channels • Direct connection between processes
Algorithm 1,1 2 r3 r4 p1 (1,1) (1,2) r3 p2 1,2 2 r3 (1,1)(1,2) (1,2) p3 (1,2) (1,1)(1,2) 2 3 p1 has higher time stamp messages from p2 and p3. It’s message is at top of queue. So p1 enters p1 sends release and now p2 enters
Algorithm Summary • Requesting CS • Send timestamped REQUEST • Place request on request queue • On receiving REQUEST • Put request on queue • Send back timestamped REPLY • Enter CS if • Received larger timestamped REPLY • Request at the head of queue • Releasing CS • Send RELEASE message • On receiving RELEASE remove request
Global State Revisited • Earlier in the talk we had discussed the problem where a process actively tries to get the global state • Solution to the problem that calculates only consistent global states • Model • Process only knows about its internal events • Messages it sends and receives
Requirements • Each process records it own local state • The state of the communication channels is recorded • All these small parts form a consistent whole • State Detection must run along with underlying computation • FIFO reliable channels
What exactly is channel state • Let c be a channel from p to q • p records its local state(Lp) and so does q(Lq) • P has some sends in Lp whose receives may not be in Lq • It is these sent messages that are the state of q • Intuitively messages in transit when local states collected
Basic Algorithm Description Send A Recv C A M A Send B Recv M, Record State, Channel (2,1)empty p1 p0 Record State Send M M Recv A B C Recv M, Record State, Channel (0,1)A B C M p2 Send C Recv B Recv M, Record State, Channel (0,1)empty, Send M
Algorithm Summary • Marker sending rule • P sends a marker on every outgoing channel after it records its state and before it sends further messages • Marker receiving rule • If q has not recorded its state then begin q records its state; q records the state c as empty sequence end Else q records state of c as the messages it got along c after it had recorded its state till now
Comments on Algorithm • Marker ensures liveness of algorithm • Flooding Algorithm: O(n2) messages • Properties of the recorded global state • So is such a state useful • Stable properties s2 s1 se
Conclusion • We looked at • Fundamental concepts in distributed systems • Ordering in distributed systems • Global State Detection • Papers are some of classic works in distributed systems • Where theory meets practice!!!!