770 likes | 1.06k Views
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong. Outline. Introduction Asynchronous distributed systems, distributed computations, consistency Two different strategies to construct global states
E N D
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms CS 249 Project Fall 2005 Wing Wong
Outline • Introduction • Asynchronous distributed systems, distributed computations, consistency • Two different strategies to construct global states • Monitor passively observes the system (reactive-architecture) • Monitor actively interrogates the system (snapshot protocol) • Properties of global predicates • Sample applications: deadlock detection and debugging Consist Global States
Introduction • global state = union of local states of individual processes • many problems in distributed computing require: • construction of a global state and • evaluation of whether the state satisfies some predicate Φ • difficulties: • uncertainties in message delays • relative speeds of computations • global state obtained can be obsolete, incomplete, or inconsistent Consist Global States
Distributed Systems • collection of sequential processes p1, p2, …, pn • unidirectional communication channels between pairs of processes • reliable channels • messages may be delivered out of order • network strongly connected (not necessarily completely) Consist Global States
Asynchronous Distributed Systems • no bounds on relative process speeds • no bounds on message delays • no synchronized local clocks • communication is the only possible mechanism for synchronization Consist Global States
Distributed Computations • distributed program executed by a collection of processes • each process executes a sequence of events • communication through events send(m) and receive(m), m as message identifier Consist Global States
Distributed Computations • hi = ei1ei2… • local history of process pi • canonical enumeration • total order imposed by sequential execution • hik= ei1ei2… eik • initial prefix of hi containing first k events • H = h1U … U hn • global history containing all events • does not specify relative timing between events Consist Global States
Distributed Computations • to order events, define binary relation “→” to capture “cause-and-effect”: • e→e’ if and only if e “causally precedes” e’ • concurrent events: neither e→e’ nor e’→e, write e || e’ • distributed computation = partially ordered set defined by (H, →) Consist Global States
Distributed Computations • e21→e36 ; e22 || e36 Consist Global States
Global States, Cuts and Runs • σik • local state of process pi after event eik • Σ = (σ1,… ,σn) • global stateof distributed computation • n-tuple of local states • cut C = h1c1 U … U hncn or (c1, …, cn) • subset of global history H Consist Global States
Global States, Cuts and Runs • (σ1c1,… ,σncn) • global state correspond to cut C • (e1c1,… ,encn) • frontier of cut C • set of last events • run • a total ordering R including all events in global history • consistent with each local history Consist Global States
Global States, Cuts and Runs • cut C = (5,2,4); cut C’ = (3,2,6) • a consistent run R = e31e11e32e21e33e34e22e12e35e13e14e15e36e23e16 Consist Global States
Consistency • cut C is consistent if for all events e and e’ • closed under the causal precedence relation • consistent global state corresponds to a consistent cut • run R is consistent if for all events, e→e’ implies e appears before e’ in R Consist Global States
Consistency • run R = e1e2… results in a sequence of global states Σ0Σ1Σ2 • Σi is obtained from Σi-1 by some process executing event ei , or Σi-1 leads toΣi • denote the transitive closure of the leads-to relation by ~>R • Σ’ is reachable from Σ in run R iff Σ ~>R Σ’ Consist Global States
Lattice of Global States • lattice = set of all consistent global states, along with leads-to relation • Σk1…kn = shorthand for global state (σ1k1,…,σnkn) • k1 + … + kn = level of lattice Consist Global States
Lattice of Global States • path = sequence of global states of increasing level (downwards) • each path corresponds to a consistent run • a possible path:Σ00 Σ01 Σ11 Σ21 Σ31 Σ32 Σ42 Σ43 Σ44 Σ54 Σ64 Σ65 Consist Global States
Observing Distributed Computations (reactive-architecture) • processes notify monitor process p0 whenever they execute an event • monitor constructs observation as the sequence of events corresponding to the notification messages • problem: • observation may be inconsistent due to variability in notification message delays Consist Global States
Observing Distributed Computations Consist Global States
Observing Distributed Computations • any permutation of run R is a possible observation • we need: • delivery rule at monitor process to restore message order • we have First-In-First-Out (FIFO) delivery using sequence number for all source-destination pair pi, pj: • sendi(m) → sendi(m’) => deliverj(m) → deliverj(m’) Consist Global States
Delivery Rule 1 • assume: • global real-time clock • message delays bound by δ • process includes timestamp (real-time clock value) when notifying p0 of local event e DR1: At time t, deliver all received messages with timestamps up to t – δ in increasing timestamp order Consist Global States
Delivery Rule 1 • let RC(e) denotes value of global clock when e is executed • real-time clock satisfies Clock Condition: • e→e’ => RC(e) < RC(e’) • but logical clocks also satisfies clock condition… Consist Global States
Logical Clocks • event orderings based on increasing clock values • LC(ei) denotes value of logical clock when ei is executed by pi • each sent message m contains timestamp TS(m) • update rules by pi at occurrence of ei: Consist Global States
Logical Clocks Consist Global States
Delivery Rule 2 • replace real-time clock by logical clock • need gap-detection property: • given events e, e’ where LC(e) < LC(e’), determine if some event e’’ exists such that LC(e) < LC(e’’) < LC(e’) • message is “stable” at p if no future messages with timestamps smaller than TS(m) can be received by p Consist Global States
Delivery Rule 2 • with FIFO, when p0 receives m from pi with timestamp TS(m), can be certain no other message m’ from pi with TS(m’) ≤ TS(m) • message m at p0 guaranteed stable when p0 has received at least one message from all other processes with timestamps > TS(m) DR2: Deliver all received messages that are stable at p0 in increasing timestamp order Consist Global States
Strong Clock Condition • DR1, DR2 assume RC(e) < RC(e’) (or LC(e) < LC(e’)) => e→ e’ • recall RC and LC guarantee clock condition: e→e’ => RC(e) < RC(e’) • DR1, DR2 can unnecessarily delay delivery • want timing mechanism TC that gives Strong Clock Condition: • e→e’≡TC(e) < TC(e’) Consist Global States
Timing Mechanism 1 - Causal Histories • causal history as “clock” value • set of all events that causally precede event e: • smallest consistent cut that includes e • projection of θ(e) on process pi:θi(e) =θ(e)∩ hi Consist Global States
Timing Mechanism 1 - Causal Histories Consist Global States
Timing Mechanism 1 - Causal Histories • To maintain causal histories: • θ initially empty • if ei is an internal or send event • θ(ei) = {ei} U θ(previous local event of pi) • if ei = receive of message m by pi from pj • θ(ei) = {ei} U θ(previous local event of pi) U θ(corresponding send event at pj) Consist Global States
Timing Mechanism 1 - Causal Histories new send event: new event e15 new event e23 new receive event: Consist Global States
Timing Mechanism 1 - Causal Histories • can interpret clock comparison as set inclusion: • e→e’≡θ(e) θ(e’) (why not set membership, [e→e’≡eθ(e’)]?) • unfortunately, causal histories grow too rapidly Consist Global States
Timing Mechanism 2 - Vector Clocks • note: • projection θi(e) =hik for some unique k • eir θi(e) for all r < k • can use single number k to represent θi(e) • θ(e) = θ1(e) U … U θn(e) • represent entire causal history by n-dimensional vector clockVC(e), where for all 1 ≤ i ≤ n • VC(e)[i] = k, if and only if θi(e) =hik Consist Global States
Timing Mechanism 2 - Vector Clocks Consist Global States
Timing Mechanism 2 - Vector Clocks • To maintain vector clock: • each process pi initializes VC to contain all zeros • update rules by pi at occurrence of ei: • VC(ei)[i] ≡ number of events pi has executed up to and including ei • VC(ei)[j] ≡ number of events of pj that causally precede event ei of pi Consist Global States
Timing Mechanism 2 - Vector Clocks causal histories vector clocks new send event: new receive event: Consist Global States
Vector Clock Comparison • Define “less than” relation: • V < V’ ≡ (V ≠ V’) ( 1 ≤ k ≤ n: V[k] ≤ V’[k]) Consist Global States
Properties of Vector Clocks • Strong Clock Condition: • e→e’≡VC(e) < VC(e’) • Simple Strong Clock Condition:given event ei of pi and event ej of pj, i≠ j • ei→ej≡VC(ei)[i] ≤VC(ej)[i] Consist Global States
Properties of Vector Clocks • Test for Concurrency: given event ei of pi and event ej of pj • ei || ej≡ (VC(ei)[i] > VC(ej)[i]) (VC(ej)[j] > VC(ei)[j]) • Pairwise Inconsistent: given event ei of pi and ej of pj, i≠ j • if ei , ej cannot belong to the frontier of the same consistent cut (VC(ei)[i] < VC(ej)[i]) (VC(ej)[j] < VC(ei)[j]) (concurrent) Consist Global States
Properties of Vector Clocks • Consistent Cut: • frontier contains no pairwise inconsistent events VC(eici)[i] VC(ejcj)[i] , 1 ≤ i, j ≤ n • Counting # of events causally precede ei: • #(ei) = (Σj=1 .. nVC(ei)[j]) – 1 # events = 4+1+3-1 = 7 Consist Global States
Properties of Vector Clocks • Weak Gap-Detection: given event ei of pi and ej of pj, • if VC(ei)[k] < VC(ej)[k] for some k ≠ j, there exists event ek such that (ek → ei) (ek → ej) Consist Global States
Causal Delivery and Vector Clocks • assume processes increment local component of VConly for events notified to monitor p0 • p0 maintains set M for messages received but not yet delivered • suppose we have: • message m from pj • m’ = last message delivered from process pk, k≠ j Consist Global States
Causal Delivery and Vector Clocks • To deliver m, p0 must verify: • no earlier message from pj is undelivered(i.e. TS(m)[j] – 1 messages have been delivered from pj) • no undelivered message m’’ from pk s.t.sendk(m’)→sendk(m’’)→sendj(m), k≠ j(i.e. whether TS(m’)[k] TS(m)[k] for all k) Consist Global States
Causal Delivery and Vector Clocks • p0maintains array D[1…n] where D[i] = TS(mi)[i], mi being last message delivered from pi • e.g. on right, delivery of m is delayed until m’’ is received and delivered Consist Global States
Delivery Rule 3 • Causal Delivery: • for all messages m, m’, sending processes pi, pjand destination process pk sendi(m) → sendj(m’) => deliverk(m) → deliverk(m’) • DR3 (Causal Delivery): Deliver message m from process pj as soon as D[j] = TS(m)[j] – 1, and D[k] TS(m)[k], k≠ j • p0 set D[j] to TS(m)[j] after delivery of m Consist Global States
Causal Delivery and Hidden Channels • should apply to closed systems • incorrect conclusion with hidden channels (communication channel external to the system) Consist Global States
Active Monitoring - Distributed Snapshots • monitor p0 requests states of other processes and combine into global state • assume channels implement FIFO delivery • channel state χi,j for channel pi to pj: messages sent by pi not yet received by pj Consist Global States
Distributed Snapshots • notations:INi = set of processes having direct channels to piOUTi = set of processes to which pi has a channel • for each execution of the snapshot protocol, process pi record its local state σiand the states of its incoming channels (χj,ifor all pjINi) Consist Global States
Distributed Snapshots • Snapshot Protocol (Chandy-Lamport) • p0 starts the protocol by sending itself a “take snapshot” message • when receiving the “take snapshot” message for the first time from process pf • pirecords local state σi and relays the “take snapshot” message along all outgoing channels • channel state χf,i is set to empty • pi starts recording messages on other incoming channels Consist Global States
Distributed Snapshots Snapshot Protocol (Chandy-Lamport) • when receiving the “take snapshot” message beyond the first time from process ps: • pi stops recording messages along channel from ps • channel state χs,i are messages that have been recorded Consist Global States
Distributed Snapshots p1 done p2 done • dash arrows indicate “take snapshot” messages • constructed global state: Σ23; χ1,2 empty; χ2,1 = {m} Consist Global States