Automatic Generation of Programs Using Model Checking and Genetic Programming. Gal Katz, Doron Peled, Bar Ilan University. Agenda. Introduction & motivation Genetic Programming Program synthesis Model Checking Combined method Application to mutual exclusion Conclusions & future work.
Introduction • Genetic programming • A methodology for automatic programming inspired by Darwinian evolution [Koza 92]. • Used for automatic generation of programs in various fields. • Mostly used for optimization related problems. • Fitness is usually calculated by checking program performance against test cases. • Less used for problems with a strict specification.
Introduction (2) • Model Checking • An automatic formal verification technique used mainly with finite-state software and hardware systems. • Can be used to verify communication and concurrent protocols. • Models are checked against a strict specification. The result is either: • A confirmation that the model satisfies the specification, or • A counterexample of that fact.
Introduction (3) • How to construct a model from the spec.? • Synthesis • Transforms spec. directly to a model that satisfies it. • Complicated. • Currently not practical for automatic program generation. • Brute-force enumeration • All possible programs of a specific domain and size are generated and model-checked. • All existing solutions will eventually be found. • Very time-intensive. Not practical for programs with more than few lines of code.
Our MethodCombining GP & Model Checking User 1. Specification 2. Configuration 6. Final Model / Results GP Engine EnhancedModel Checker 3. Initial population 4. Verification results 5. New programs
Main Steady-state GP Algorithm • Create initial program population. • Randomly choose μ programs. • Create λ new programs by applying genetic operations to the above μ programs. • Calculate fitness function for μ + λ programs, and use it to select μ new programs. • Replace the old μ programs by the selected ones. • Repeat steps 2-5 until either: • a perfect solution is found, or • maximum allowed number of iterations is reached.
while != assign A[ ] 0 A[ ] 1 2 me Program Representation • Programs are represented as trees. • Internal nodes represent expressions or instructions with parameters (assignment, while, if, block). • Terminal nodes represent constants or expressions without any parameter (0, 1, 2, me, other). • Strongly-typed GP is used [Montana 95]. While (A[2] != 0) A[me] = 1
Initial Population Creation • Population usually contains 100 – 1000 programs. • Program are created recursively using the “grow” method [KOZA 92]. • The root is randomly selected from instruction nodes. • Offspring are randomly selected from allowed node or terminals as long as rules are preserved. • If max allowed tree depth is reached, a terminal must be chosen.
Genetic Operations • At each iteration of the GP algorithm, the following genetic operations are applied to the selected programs: • Reproduction – programs are copied without any change • Mutation • Crossover
Mutation Operation • The main operation we use. • Allows performing small modifications to an existing program by the following method: • Randomly choose a program node (internal, or leaf). • According to the node type, apply one of the following operations with respect to the chosen node (strong typing must be kept):
A[ ] 0 Replacement Mutation type (a) while • Replace the sub-tree rooted by node with a new randomly generated sub-tree. • Can change a single node or an entire sub-tree. != assign A[ ] 0 A[ ] 1 2 me While (A[2] != 0) A[me] = A[0] While (A[2] != 0) A[me] = 1
while != assign while while assign A[ ] != != 0 A[ ] 1 block block A[ ] other 2 me assign A[ ] A[ ] 0 0 assign 2 2 2 A[ ] 1 A[ ] 1 me me Insertion Mutation type (b) • Add an immediate parent to the selected node. • Randomly create other offspring to the new parent, if needed. • According to the selected parent type, can cause: • Insertion of code, • Wrapping code with a while loop, • Extending Boolean expressions. While (A[2] != 0) A[me] = 1 While (A[2] != 0) A[2] = other A[me] = 1
Reduction Mutation Type (c) • Replace the selected node by one of its offspring. • Delete the remaining offspring of the node. • Has the opposite effect of the previous insertion mutation, and reduces the program size.
empty while assign != A[ ] 1 A[ ] 0 me 2 Deletion Mutation Type (d) while • Delete the sub-tree rooted by the node. • Update ancestors recursively. != A[ ] 0 2 While (A[2] != 0) A[me] = 1
Crossover Operation • Creates new programs by merging building blocks of two existing programs. • Crossover steps are: • Randomly choose a node from the 1st program. • Randomly choose a node from the 2nd program, that has the same type as the 1st node. • Exchange between the sub-trees rooted by the two nodes, and use the two newly created programs.
empty while assign == A[ ] other other A[ ] 0 me Crossover Example block if assign != A[ ] 1 A[ ] me 2 me A[2] = me a[0] = other If (A[me] != 1) while (a[me] == other) If (A[me] != 1) a[0] = other A[2] = me while (a[me] == other)
Crossover (cont.) • Heavily used by traditional GP [Koza]. • Tries to mimic biological sexual recombination, but • Unlike biology (and unlike GA), GP lacks the notion of “genes” [Banzhaf et al. 01]. • Often acts only as a macro-mutation. • Various methods were developed in order to turn it into a more fruitful operation (Brood, Inteligent crossover). • Still, not a significant operation for small programs like those of Mutual Exclusion.
Selection • At each iteration, selection is applied to all μ + λ programs (over-production selection). • Program are selected using a fitness-proportional (roulette) method [Holland 92]. • “Elitism” is used to ensure that the best program is always selected. • Similar to Evolution Strategies [Rechenberg 94] and Brood Recombination method [Tackett 94] - better protection from harmful operations.
Program Synthesis • Synthesis of finite state system was suggested by Rabin [Rabin, Buchi] • Machinery includes finite tree automata. • Can be solved by finding game strategies [McNaughton, Emerson-Lei]. • For concurrent and distributed systems, the problem is undecideable [Pnueli-Rosner]. • Decidable for special cases, e.g., pipeline architectures [Pnueli-Rosner] in double-exponential time in size of LTL property!
ω-automata • Runs on infinite words, and consist of: • A finite alphabet Σ, • A finite set of states S, • A set of initial states S0 S, • A transition relation Δ S x S, • A labeling function L : S → ∑, • An acceptance condition Ω. • In this version, the labels are on the states instead of on the arcs.
Acceptance conditions • For a run p, inf(p) denotes the states appearing infinitely on p. • Buchi condition: • A set of states F S, • A run p over A is accepted if inf(p) ∩ F ≠ Ø • Streett condition: • A set of k pairs (Ei,Fi), 1 ≤ i ≤ k, Ei, Fi S, • A run p over A is accepted if for all pairs: • inf(p) ∩ Ei ≠ Ø→ inf(p) ∩ Fi ≠ Ø.
ω-automata Closure • Buchi automata can be converted into Streett automata, and vice versa. • Both Buchi and Streett automata are closed under intersection and complement. • Streett automata are less simple to use, but are closed under determinization, while Buchi automata are not.
Building Program’s State-graph • Each state consists of values of variables, program counters, buffers, etc. • Edges represent atomic transitions caused by program instructions. • Can be built by a DFS algorithm. • Can be decomposed into SCCs [Tarjan 72].
Converting Model to ω-automaton • We use the states, initial state and transitions of the program’s state-space. • Acceptance condition can allow all runs, or impose fairness conditions. • Streett automata can be used in order to define various fairness conditions (weak & strong).
Safety Properties • Basic properties can be checked by simply analyzing the state graph: • Invariants– can be checked on every visited state. • Deadlocks– states without outgoing edges. • Unreachable code– instructions that are not represented on any transition. • Liveness properties require a more complicated process.
Specification • We use Linear Temporal Logic (LTL) [Pnueli 77] to define specification properties. • LTL formulas are interpreted over an infinite sequences of states, and consist of: • Propositional variables, • Logical connectives, such as , , , , and • Temporal operators, such as: • (p)– p will eventually occur. • (p)– p always occurs. • A model M satisfies a formula φ (M╞φ) if every (fair) run of M satisfies φ.
Converting specification to ω-automaton • Every LTL property can be converted into a Buchi automaton with a size exponential to the LTL formula size [Vardi & Wolper 94]. • For deterministic Streett automata, a determinization process is also required [Safra 88]. • May result in a doubly exponential blowup from LTL property.
The Model Checking Process [Vardi & Wolper 86] • Both model and speciation are converted to ω-automata over the same alphabet. • The alphabet is 2AP, where AP denotes a set of atomic propositions that may hold on the system states. • Every word accepted by M (a fair run) should be accepted by the spec, therefore we have to check whether: L(M) L(φ(.
L(φ) L(M) L(φ) L(M) Model Checking Results • It’s easier to check whether: • L(M) ∩ L(φ( = Ø, or • L(M) ∩ L(φ( = Ø. • Case 1: • Intersection is empty. • M satisfies φ . • Case 2: • Intersection is not empty. • Runs contained in the intersection can be used for generating counterexamples.
Model Checking and GP • Can standard model checking results be used as a GP fitness function? • Yes, but it was done so far with a limited success [Johnson 07]. • A fitness function with just two values is a poor one. • We wish to analyze the model checking graph in order to quantify the level of satisfaction. • We have a specialized model checking algorithm.
Detour: Discriminative Model Checking [Niebert, Peled, Pnueli]
Linear Temporal Logic O U
Computation Tree Logic EG p AF p p p p p p p p p p p . . . . . . . . . . . . . . . . . . . . . . . .
Our point of view • Linear time is sufficient for specifying most properties. • A counterexample is often not enough: • Gives very little clue about the location of the error. • Does not give information about how good and bad executions are related to each other. • Thus, for analysis beyond finding the existence of an error, we promote a “deeper” search.
Our suggestion • Primary or base specification in LTL, for the base property. • Analysis specification, quantifies over executions that satisfy or do not satisfy the base specification. Syntax:p | \/ | | | | (and others) Semantics:- there exists a continuation satisfying the property , where holds from the beginning. - there exists a continuation not satisfying the property , where holds from the beginning.
Semantics illustration Semantics:- there exists a continuation satisfying the property , where holds from the beginning. - there exists a continuation not satisfying the property , where holds from the beginning. holds holds . . . . . . . . . . . .
Examples for specifications • Bad executions depend on infinitely many “bad choices”: ¬<>true • Before executing a, there are good and bad executions. Once a is executed, things things are persistently bad: ((¬Execa/\true)W(Execa/\false)) • Properties such as “from some point all continuations are good/bad”.
How to do model checking? • We need to remember some information about the path so far to verify that with the rest of the computation it is (not) satisfying . • Suppose we would have run a Buchi automaton for , but with nondeterminism, maybe it is running on the wrong branch to be completed. • Thus, we would be running a subset construction (determinization) of the Buchi automaton. • At the point of branching, we continue with a state consistent with one of the Buchi states in the current subset. • Apply CTL* model checking to this structure.
Complexity • EXSPACE-complete even for AG true • Reduction shown for related logic mCTL*[KV LICS 2006] (this logic has different semantics, where quantification always start from the initial state). • But: EXSPACE-complete in size of LTL formula, PSPACE-complete in size of branching formula and the verified code.
Fitness Levels 0. trueAll executions are bad! • true /\ trueThere are good and bad executions. • trueEach bad execution can turn into a good one = there are infinitely many bad choices. • trueAll executions are good!
Overall Fitness Function • Fitness levels & scores are calculated for each specification property. • How to merge into a single fitness function? • Naïve summing can bias the results, since some properties may be trivially satisfied when more basic properties are violated. • Thus, spec. properties are divided into levels, starting from level 1 for most basic properties. • As long as not all properties at level i are satisfied, properties at higher level gets fitness of 0. • This algorithm also saves running time by skipping unneeded checks.
Parsimony • GP programs tend to grow up over time to the maximal allowed tree size (“bloating”). • Large portions of the code become “introns” (junk DNA). • To avoid that, we use parsimony as a secondary fitness measure. • Number of program nodes * small factor is subtracted from the fitness score. • The factor should be carefully chosen. • Should encourage programs to reduce their size, but • Should not harm the evolutionary process. • Therefore, programs cannot get a score of 100, but only get close to it. The run can be stopped when all properties are satisfied. • Programs can be reduces either by mutations, or directly by detecting dead code by the model checking process, and then removing it.
“Vacuity” (p q) pq • A special care is needed for implication properties of the form (p q). • Some (or all) executions may be vacuously satisfied if p never happens. • We are usually interested only on runs when p eventually occurs. • Other runs are neither good nor bad. They are irrelevant. • Thus, in these cases, the program automata is first intersected with the property p. • Some SCC might be marked irrelevant. p pq (p q) p • If all SCCs are irrelevant, fitness level 0 is assigned. • A similar mechanism is used for excluding unfair runs.
The Mutual Exclusion Problem • Originally described by [Dijkstra 65]. • Many variants and solutions exist. • Modeled using the following program parts: • Non Critical Section • Pre Protocol • Critical Section • Post Protocol • We wish to automatically generate correct code for the pre and post protocol parts.
Spec. Properties • The specification includes the following LTL properties: • The properties are converted into Streett automata.
Runs Configuration • 3 different sets of runs: • The following parameters were used: • Population size: 150 • Max number of iterations: 2000 • μ: 5 • λ: 150
An Example of a Run (1st variant) • Randomly created. • Does not satisfy mutual exclusion property. • Higher level properties are set to 0. Score: 0.0
An Example of a Run (1st variant) • Randomly created. • While loop guarantees mutual exclusion. • Only process 0 can enter the critical section. Score: 66.77