570 likes | 730 Views
Modeling and Analyzing Fault-Tolerant, Real-Time Communication Protocols. Nancy Lynch Theory of Distributed Systems MIT Second MURI Workshop Berkeley, California June 4, 2001. MIT Participants. Leaders: Nancy Lynch, Idit Keidar
E N D
Modeling and Analyzing Fault-Tolerant, Real-Time Communication Protocols Nancy Lynch Theory of Distributed Systems MIT Second MURI Workshop Berkeley, California June 4, 2001
MIT Participants • Leaders: Nancy Lynch, Idit Keidar • Students: Carl Livadas, Roger Khazan, Ziv Bar-Joseph • Collaborators: Paul Attie, Alex Shvartsman, Roberto Segala, Frits Vaandrager
General Models and Proof Methods • I/O automaton models [Lynch, Tuttle 87] • Nondeterministic, infinite-state machines • Input/output/internal actions, traces • Modularity: Composition, levels of abstraction • Mathematical, language-independent • Used to model distributed algorithms, communication protocols • Validation, code generation, upper and lower bounds
Timing, Hybrid Considerations • Timing: TIOAs [Lynch, Vaandrager] • Timeout-based algorithms. • Local clocks, clock synchronization • Performance analysis • Hybrid: HIOAs [L, Segala, V, Weinberg 96] • Real world + computer components • Continuous flows of data
Other Embellishments • Probabilities: PIOA, PTIOA [Segala 95] • Probabilistic and nondeterministic behavior. • Randomized distributed algorithms • Systems with probabilistic assumptions • Dynamic systems: DIOA [Attie, Lynch 99] • Run-time process creation and destruction, mobility. • Agent systems
Communication Protocol Modeling and Analysis. • At-most-once (AMO) Message Delivery • TCP, T/TCP • Reliable channels from unreliable channels • Self-stabilizing communication protocols • Network clock synchronization • Group communication systems
Group Communication Service • Communication middleware • Manages group membership, current view • Handles joins, leaves, failures, partitions, merges • Multicast communication among members • Multicasts respect views • Ordering, reliability constraints for message delivery, e.g., FIFO, causal within each view. • Isis, Transis, Totem,…
VStoTO brcv bcast TO VStoTO VStoTO gprcv newview gpsnd VS
Conditional Performance Analysis • Assume VS satisfies: • If a network component C stabilizes, then soon thereafter, views become consistent within C, and messages sent in the final view are delivered everywhere in C, within bounded time. • And VStoTO satisfies: • Simple timing, fault-tolerance assumptions. • Then TO satisfies: • If C stabilizes, then soon thereafter, any message sent or delivered anywhere in C is delivered everywhere in C, within bounded time.
Conditional Performance Analysis • Give conditional claims about system performance under particular assumptions about behavior of environment and of network substrate, e.g.: • Stabilization of underlying network. • Limited rate of change. • Bounds on message delay. • Limited amount of failure (number, density). • Limited input arrivals (number, density). • Assumptions => Guarantees. • Get probabilistic statements as corollaries. • Composable
What we proposed: 1. Model, analyze communication protocols. 2. Develop conditional performance analysis techniques. 3. Extend I/O automata theory to accommodate performance, reliability, hybrid, probability, dynamic considerations. 4. Relate, integrate I/O automata with other frameworks.
Progress this year 1. Communication protocol design/analysis • Scalable Group Communication • Totally Ordered Multicast with QoS • Scalable Reliable Multicast • Conditional performance analysis methods • Evolving… 3. I/O automaton models • Hybrid I/O Automata • Dynamic I/O Automata • IOA language support 4. Comparing, integrating with other models • A start…
Group Communication Service • Manages group membership, current view. • Multicast communication among group members, with ordering, reliability guarantees. • Virtual Synchrony[Birman, Joseph 87] • Integrates group membership and group communication. • Processes that move together from one view to another deliver the same messages in the first view. • Useful for replicated data management. • Before announcing new view, processes must synchronize, exchange messages.
Example: Virtual Synchrony i j k 3: i,j,k 3: i,j,k 3: i,j,k mcast(m) rcv(m) rcv(m) 4: i, j 4: i, j VS algorithm supplies missing m
VSGC Net GM Group Communication in WANs • Difficulties: • High message latency, message exchanges are expensive • Frequent connectivity changes • New, scalable GC algorithm: • Uses scalable GM service of [Keidar, Sussman, et al. 00], implemented on special membership servers. • GC (with virtual synchrony) implemented on clients. VS
Group Communication in WANs • Try to minimize time from when network stabilizes until GC delivers new views to clients. • After stabilization: GM forms view, VSGC algorithm synchronizes. • Existing systems (LANs): • GM, VSGC uses several message exchange rounds • Continue in spite of new network events • Inappropriate for WANs view(v) Net event VSGC Algorithm GM Algorithm
view(v) New Algorithm • VSGC uses one message exchange round, in parallel with GM’s agreement on views. • GM usually delivers views in one message exchange. • Responds to new network events during reconfiguration: • GM produces new membership sets • VSGC responds to membership changes • Distributed implementation [Tarashchanskiy 00] Net event VSGC Algorithm GM Algorithm
S S’ A A’ Correctness Proofs • Models, proofs (safety and liveness) • Developed new incremental modeling, proof methods [Keidar, Khazan, Lynch, Shvartsman 00] • Proof Extension Theorem: • Used new methods for the safety proofs.
Performance Analysis • Analyze time from when network stabilizes until GC delivers new views to clients. • Compare with other strategies. • System is a composition: • Network and GM services, plus • VSGC processes • Use composition in the analysis.
Performance Analysis • Analyze the VSGC algorithm alone, in terms of its inputs and timing assumptions. • State reasonable performance guarantees for GM and Network. • Combine to get conditional performance properties for the system as a whole.
1. Analysis of VSGC algorithm • Assume component C stabilizes: • GM delivers same views • Net provides reliable communication with latency . • Let • T[start], T[view] be times of last GM events for C • be upper bound on local step time. • Then VSGC outputs new views by time max (T[start] + + x, T[view]) +
view(v) view(v) + x Analysis of VSGC Algorithm VS Algorithm Net Event start start GM algorithm T[start] T[view]
2. Assumed Bounds for GM T[start] T[view] • Bounds for “Fast Path” of [Keidar, et al. 00],observed empirically in almost all cases. start start view(v) GM
view(v) 3. Combining VSGC and GM Bounds + x • Bounds for system, conditional on GM bounds. VSGC start start view(v) T[start] T[view] GM
Totally Ordered Multicast with QoS [Bar-Joseph, Keidar, Anker, Lynch 00]
Totally Ordered Multicast with QoS • Multicast to dynamic group, subject to joins, leaves, and failures. • Global total ordering of messages • QoS: Message delivery latency • Built on reliable network with latency guarantees • Add ordering guarantees, preserve latency bounds. • Target applications • State machine replication • Military command and control • Distributed games • Shared editing
Two Algorithms • Algorithm 1: Basic Totally Ordered Multicast • Sends, receives consistent with total ordering of messages. • Non-failing processes agree on messages from non-failing processes. • Latency: Constant, even with joins, leaves, failures. • Algorithm 2: Atomic Multicast • Non-failing processes agree on all messages. • Latency: • Joins, leaves only: Constant • With failures: Linear in f TOM fail_i fail_j Net
Local Node Process join leave rcv(m) mcast(m) Ord_i joiners(s,J), leavers(s,J) end-slot(s) FrontEnd_i members(s,J) Memb_i mcast(m) join leave mcast(join) mcast(leave) progress(s,j) Sniffer_i rcv(join) rcv(leave) rcv(m) Net
Local Algorithm Operation • FrontEnd divides time into slots, tags messages with slots. • Ord delivers messages by slot, in order of process indices. • Memb determines slot membership. • Join, leave messages • Failures: • Algorithm 1 uses local failure detector. • Algorithm 2 uses consensus on failures. • Requires new dynamic version of consensus. • Timing-dependent
Net GM Architecture for Algorithm 2 TO-QoS
Performance Analysis (Planned) 1. Latency of TO-QoS in terms of GM 2. GM latency bounds 3. Combine
Using Caching to Improve Reliable Multicast Algorithms[Livadas]
SRM[Floyd, et al.] • Reliable multicast to dynamic group. • Built over IP multicast • Based on requests (NACKs) and retransmissions • Limits duplicate requests/replies using: • Deterministic suppression: Ancestors suppress descendants, by scheduling requests/replies based on distance to source. • Probabilistic suppression: Siblings suppress each other, by spreading out requests/replies.
SRM Architecture SRM IPMcast
New Protocol • Tries to improve SRM by using loss history information. • Useful if future losses occur on same link. • Uses deterministic suppression for siblings also • Determines, caches best requestor, best replier • Chooses requestor closest to source. • Chooses replier closest to requestor. • Break ties with processor ids. • Defaults to SRM Replier Requestor
Performance • Metrics: • Loss recovery latency: Time from detection of packet loss to receipt of first retransmission • Loss recovery overhead: Number of messages multicast to recover from a message loss • Protocol performance benefits: • Removes delays caused by probabilistic suppression • Following election of requestor and replier: • Reduces latency by using best requestor and replier. • Reduces overhead by using single requestor and replier. • Latency analysis (Planned)
Hybrid I/O Automata[Lynch, Segala, Vaandrager, HSCC 01] • New, simpler version of HIOA model of [LSVW96] • Supports decomposing hybrid system descriptions: • External behavior: Discrete actions and continuous flows • Composition: Synchronizes external actions and flows, respects external behavior • Abstraction: Implementation and simulation relation notions, respect external behavior. • Separate mechanisms: • External actions for discrete communication. • External variables for continuous flow.
Del(d1) Del(d1) Del(d2) Del(d2) Example: Delay Buffer Del(d) • Accepts discrete and continuous input, produces isomorphic output, with delay d. • Compose in sequence, in cycle: • Composition implements Del(d1 + d2): Del(d1) Del(d2) Del(d1 + d2)
Vehicle Sensor Actuator Controller Example: Vehicle and Controller • Keep vehicle speed in [v1, v2]. • Sensor senses velocity, reports to Controller every time d. • Controller suggests acceleration. • Vehicle follows suggested acceleration, with uncertainty ε. • Compose: Discrete, continuous interactions • Prove invariant: velocity in [v1,v2]. • Use auxiliary invariants, including timing. acc-in vel-out report(v) suggest(a)
HOIA definition • U, X, Y: Input, output, internal (state) variables • Θ: Initial states • I, O, H: Input, output, internal actions • D, discrete transitions • T, trajectories • Mappings from time intervals to valuations of variables • Closure properties • Input-enabling for actions, flows • Execution:τ0, a1, τ1, a2, τ2, … • Trace: Restrict to external variables and actions
Composition and Abstraction • Abstraction: • A implements B if comparable and traces(A) subset of traces(B). • Simulation relation: Start, step, trajectory conditions • Theorem: Simulation relation implies implementation • Composition: • Synchronize external actions and variables • Theorems: Projection, pasting, substitutivity • Receptiveness: • Doesn’t cooperative in producing Zeno behavior • Theorem: Closed under composition (with technical assumption).
Dynamic I/O Automata [Attie, Lynch, Concur 01] • Dynamic version of I/O automata, including: • Automaton creation and destruction • Signature change • Two-level model: Automata, configurations. • Mobility modeled using signature change
I A O IOA Language and Tools • Language for describing I/O automata: [Garland, Lynch] • Front end:[Garland] • Translates to Java objects • Completely rewritten this year. • Needs support for composition. • Theorem-prover connection: [Garland, Bogdanov] • Connection with LP • Seeking connections: SAL, Isabelle, STeP, NuPRL
IOA Language and Tools • Simulator:[Chefter, Ramirez, Dean] • Has support for paired simulation. • Needs additions. • Being instrumented for invariant discovery using Daikon [Ernst] • Code generator:Tauber, Tsai • Local code-gen (translation to Java) running. • Needs composition, communication service calls, correctness proof. • Challenge examples
Plans 1. Protocol modeling/verification • Finish analysis of Scalable GC, Totally Ordered Multicast with QoS, SRM • Other protocols from this project. 2. Conditional analysis methods • Develop general methods • Compare with other methods (Trivedi)