Synthesis of Fault-Tolerant Distributed Programs

Synthesis of Fault-Tolerant Distributed Programs Ali Ebnenasir Department of Computer Science and Engineering Michigan State University East Lansing MI 48824 USA ebnenasi@cse.msu.edu Advisor: Dr. Sandeep S. Kulkarni

Motivation • Programs are subject to unanticipated faults • New classes of faults, add corresponding fault-tolerance • How to add fault-tolerance? • Design a fault-tolerant program from scratch • Incremental addition of fault-tolerance • How to ensure correctness? • Verification after the fact • Automatic synthesis of fault-tolerant programs (correct by construction)

Motivation (Continued) • Synthesis of fault-tolerant programs • Start from (Temporal Logic) specification • Start from the fault-intolerant program • Synthesis of fault-tolerant programs from their fault-intolerant versions has the potential to • Reuse the behaviors of the fault-intolerant program • Preserve behaviors that are hard to specify (e.g., efficiency) • Problem: Complexity of synthesis • A polynomial-time non-deterministic algorithm for the synthesis of fault-tolerant distributed programs [FTRTFT00]

Outline • Program and Fault Model • Distribution Model • Problem Statement • Strategy • Current Results • Future Plan

f T S p/f p Program and Fault Model • Program is identified by its state space and set of transitions • Finite State space Sp • Invariant S, fault-span T Sp • Program p, Fault f, Safety{ (s0, s1) | (s0, s1)  Sp Sp } • Fault-tolerance • Satisfy a particular fault-tolerance specification in the presence of faults • Failsafe, Nonmasking, Masking Sp

a=1,b=0 a=0,b=0 • Only if we include the transition a=1,b=1 a=0,b=1 Distribution Model • Read/Write restrictions • Example • A program p with two processes j and k • Two Boolean variables a and b • Process j cannot read b • Can we include the following transition? Groups of transitions (instead of individual transitions) must be chosen

f S' p No new transition here New transitions added here Problem Statement Distribution restrictions Fault-intolerant program p Fault-tolerant program p' Synthesis Algorithm Specification Spec Invariant S Invariant S' Faults f Finite state space Sp S

Strategy • Theoretical issues • Develop heuristics • Explore polynomial-time boundaries • Analyze fault-intolerant programs • Develop a synthesis framework for • Developers of fault-tolerance • Developers of heuristics

Theoretical Issues - Heuristics • Apply heuristics to reduce the exponential complexity [SRDS01] • Assign weights to transitions and states based on their usefulness • Different approaches for resolving deadlocks and livelocks • Identify the applicability of heuristics to the problem at hand • Choose different subsets of heuristics • Apply in different order

Theoretical Issues –Polynomial-Time Boundary • Find properties of programs/specifications where polynomial-time synthesis is possible • Example: • Algorithmic synthesis of failsafe fault-tolerant programs is NP-hard [ICDCS02] • Polynomial-time synthesis of failsafe fault-tolerance for monotonic programs and specification

Then If Does not violate safety Does not violate safety Example for Polynomial-Time Boundary:Monotonicity of Specifications Definition: A specification spec is positive monotonic with respect to variable x iff: • For every s0, s1, s’0, s’1: • The value of all other variables in s0 and s’0 are the same. • The value of all other variables in s1 and s’1 are the same. x = true x = true x = false x = false s’0 s’1 s0 s1

x = false x = false s’0 s’1 x = true x = true s0 s1 Example for Polynomial-Time Boundary: Monotonicity of Programs Definition: Program p with invariant S is negative monotonic with respect to variable x iff: • For every s0, s1, s’0, s’1: • The value of all other variables in s0 and s’0 are the same. • The value of all other variables in s1 and s’1 are the same. Invariant S

Example for Polynomial-Time Boundary: Theorem • Synthesis of failsafe fault-tolerance can be done in polynomial time if either: • Program is negative monotonic, and • Spec is positive monotonic; • Or • Program is positive monotonic, and • Spec is negative monotonic. • If only one of these conditions is satisfied then synthesizing failsafe fault-tolerance is still NP-hard. • For many problems, these requirements are easily met. • E.g., Agreement, Consensus, and Commit.

Example for Polynomial-Time Boundary:Byzantine Agreement • Processes: General, g, and three non-generals j, k, and l • Variables • d.g : {0, 1} • d.j, d.k, d.l : {0, 1, ┴ } • b.g, b.j, b.k, b.l : {true, false} • f.j, f.k, f.l : {0, 1} • Fault-intolerant program transitions • d.j = ┴ /\ f.j = 0 d.j := d.g • d.j ≠ ┴ /\ f.j = 0 f.j := 1 • Fault transitions • ¬b.g /\ ¬b.j /\ ¬b.k /\ ¬b.l b.j := true • b.j d.j :=0|1 g j k l

Example for Polynomial-Time Boundary:Byzantine Agreement (Continued) • Safety Specification • Agreement: No two non-Byzantine non-generals can finalize with different decisions • Validity: If g is not Byzantine, each non-Byzantine non-general process should finalize with the same decision as g • Read/Write restrictions • Readable variables for process j: • b.j, d.j, f.j • d.g, d.k, d.l • Process j can write • d.j, f.j

Example for Polynomial-Time Boundary:Byzantine Agreement (Continued) • Observation 1: • Positive monotonicity of specification with respect to b.j • Observation 2: • Negative monotonicity of program, consisting of the transitions of j, with respect to b.k • Observation 3: • Negative monotonicity of specification with respect to f.j • Observation 4: • Positive monotonicity of program, consisting of the transitions of j, with respect to f.k

Example for Polynomial-Time Boundary:Byzantine Agreement (Continued) • Failsafe fault-tolerant program. • d.j = ┴ /\ f.j = 0 d.j := d.g • d.j ≠ ┴ /\ ((d.j = d.k) \/ (d.j = d.l)) /\ f.j = 0 f.j := 1

Theoretical Issues –Analysis of Fault-Intolerant Programs • Analyze the behavior and the structure of the fault-intolerant program. • Example: • Reasoning about the program in high atomicity; i.e., no distribution restrictions. • Enhancement of fault-tolerance [ICDCS03]. • Take advantage of model checkers.

Theoretical Issues –Analysis of Fault-Intolerant Programs Fault-intolerant program Fault-tolerant program Synthesis Framework Intermediate program in Promela Counterexample The SPIN Model Checker

[ICDCS03] Failsafe fault-tolerant Nonmasking fault-tolerant [ICDCS02] Theoretical Issues: Current Results Masking fault-tolerant [FTRTFT00] Intolerant Program

Synthesis Framework • Goals: • Algorithmic synthesis of fault-tolerant programs from their fault-intolerant versions. • Easy to integrate new heuristics. • Easy to change its implementation. • Users: • Developers of fault-tolerance. • Developers of heuristics. • Examples: • A canonical version of Byzantine agreement. • An agreement program that is subject to Byzantine and failstop faults (1.3 million states). • A token ring program perturbed by state-corruption faults.

Related Work • E.A. Emerson and E.M. Clarke, Using branching time temporal logic to synthesize synchronization skeletons, 1982. • Z. Manna and P. Wolper, Synthesis of communicating processes from temporal logic specifications, 1984. • A. Arora, P.C. Attie, and E.A. Emerson, Synthesis of fault-tolerant concurrent programs, 1998. • P.C. Attie, and E.A. Emerson, Synthesis of concurrent programs for an atomic read/write model of computation, 1996. • O. Kupferman and M. Vardi, Synthesis with incomplete information, 1997.

Future Plan • Theoretical issues • Develop more intelligent heuristics to reduce the chance of failure in the synthesis • Find polynomial-time boundary for other levels of fault-tolerance • Synthesis framework issues • Scalability of the synthesis framework for larger programs • Implement the synthesis algorithm on a distributed platform

Future Plan - Continued • Synthesis framework issues • Use model checkers for behavioral analysis • Query • Intermediate program • Reachability analysis from a given state • Result set • Deadlock states • Non-progress cycles • Finite sequence of states

Publications • [ICDCS02] Sandeep S. Kulkarni and Ali Ebnenasir. The Complexity of Adding Failsafe Fault-Tolerance. The 22nd International Conference onDistributed Computing Systems, July 2-5, 2002 - Vienna, Austria. • [ICDCS03] Sandeep S. Kulkarni and Ali Ebnenasir. Enhancing The Fault-Tolerance of Nonmasking Programs. Accepted inthe 23rd International Conference onDistributed Computing Systems, May 19-22, 2003 - Providence, Rhode Island USA. • [SRDS03] Sandeep S. Kulkarni and Ali Ebnenasir. A Framework for Automatic Synthesis of Fault-Tolerance. Submitted to The 22nd Symposium on Reliable Distributed Systems 6th-8th/October, 2003 - Florence, Italy. • The implementation of the synthesis framework: • http://www.cse.msu.edu/~sandeep/software/Code/synthesis-framework/

Thank You! Questions and Comments?

Included iff x0 is false xn x0 x1 an = a0 a0 x’0 x’n x’1 Included iff x0 is true _ cj = xj \/ xk \/ xl Included iff xk is true Included iff xl is false Included iff xj is false Reduction from 3-SAT

Synthesis of Fault-Tolerant Distributed Programs

Synthesis of Fault-Tolerant Distributed Programs

Presentation Transcript

Fault Tolerant Distributed Systems

Fault-Tolerant Broadcast

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments

Distributed systems II Fault-Tolerant AGREEMENT

Building Algorithmically Nonstop Fault Tolerant MPI Programs

Symbolic Synthesis of Masking Fault-Tolerant Distributed Programs

Consensus problem in fault tolerant distributed computing

Distributed systems II Fault-Tolerant AGREEMENT

Fault Tolerant MPI

Fault Tolerant Distributed Computing system.

Fault Tolerant Design of Distributed Automotive Systems

Distributed systems II Fault-Tolerant Broadcast ( cnt .)

Fault Tolerant Configuration

Distributed systems II Fault-Tolerant Broadcast

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

fault-tolerant

Distributed systems II Fault-Tolerant AGREEMENT

Fault Tolerant Distributed Computing system.

Distributed systems II Fault-Tolerant Broadcast

Fault-tolerant routing

Distributed systems II Fault-Tolerant AGREEMENT

Synthesis of Fault-Tolerant Distributed Programs