130 likes | 250 Views
Inferring Finite Automata from queries and counter-examples. Eggert Jón Magnússon. Learning a language. Inferring finite automata is analogous to learning a language.
E N D
Inferring Finite Automata from queries and counter-examples Eggert Jón Magnússon
Learning a language • Inferring finite automata is analogous to learning a language. • In fact, there is no way to distinguish between two automata that recognize the same language, without examining the state structure. • We focus on finding the minimum equivalent automata.
Requirements for learning • It has been shown that the only classes of languages that can be learned from positive data only are classes which include no infinite language. • The idea is proof by contradiction.Assume that we have a guessing algorithm that can build an automaton to recognize the finite language L from the series of strings w1...wn, members of L. • Build an infinite language L’ that simply consists of the strings w1...wn, plus at least one rule or string that is not a member of L. The infinite language can therefore always fool any guessing algorithm.
Teacher • Angluin introduced the concept of a minimally adequate teacher, that can answer the questions: • “is S a member of L” – yes/no • “Is given DFA, D, the answer” – yes / or a string from the symmetric difference of LD and L (either a string that is in L and not in LD or a string that is in LD and not in L). • With a given teacher, an algorithm exists that recognizes a regular set, and is P.
Angluin’s Algorithm • Iteratively, the algorithm builds a DFA using membership queries, then presents the teacher with the DFA as a solution. • If the DFA is accepted, the algorithm is finished. Otherwise, the teacher responds with a counter-example, a string that the DFA presented would either accept or reject incorrectly. • The algorithm uses the counter-example to refine the DFA, going back to the first step.
Angluin’s Algorithm, details. • The algorithm uses two sets, S for states and E for experiments, and one observation table, T, where elements of (SSA) form rows, and elements of E form columns – the values of each cell is the outcome of a membership test for the concatenation of the row and column strings. • The set S is prefix-complete, the set E is suffix-complete. • Before making a guess, the observation table is required to be closed and consistent. • Closed means that there are no unique rows in the bottom part of the observation table, for elements in SA. • - if the observation table isn’t closed, we find a unique row in the bottom part of the observation table, and pull it’s corresponding element from SA into S • Consistent means that if two rows for elements s1, s2 in S in the table are the same, for all a in A, the rows for s1a and s2a are the same. • - if the table isn’t consistent, we find a suffix where this doesn’t hold, and add that to E.
Example Run • Let’s use an example DFA from Sipser (Example 1.68, p. 76 in International version). • The alphabet is A= {a,b}
Example, continued • S = E = {} • T initialized with • T is not closed – t(a) t() • Add “a” to S, extend T • T is now both closed and consistent.
First guess • The teacher rejects, and gives the counterexample “ba” – which is not accepted by the first guess. • We add “ba” and all it’s prefixes (“b”) to S. • S is now: {,“a”,”b”,”ba”} • Now, the table is no longer consistent – row(b) = row(ba), but row(bab)row(bb). • We add “b” to E
Second guess • The table is now consistent, and closed, so we make a guess. • Note that the unique row “bitmask values” translate directly to states.
Running time • Equivalence test uses EQDFA • Since, for each equivalence test, we add at least one state to the guess state machine, in the worst case, we make one guess for each state in the target machine. • In general, before each guess, we add only one string to either S or E. • The running time is O(m2n2 + mn3) – m is the longest counterexample produced, and n is the number of states in the target machine.
Further work • The requirement of a teacher is considered unfair by many and requiring too much knowledge of the automaton. • Estimation/exploration algorithm (EEA) is a genetic algorithm. • Creates many random state machines, and many random test strings • Compares the output of the random state machines with the output of the target machine • Iteratively refines, alternatively, the random state machines and test strings, either until convergence or until some desirable behaviour is displayed. • Verification is done with a new set of test strings.
References • Angluin, D., 1987. Learning Regular Sets from Queries and Counter-examples. • Gold, E. Mark, 1967. Language Identification in the Limit. • Bongard, J., Lipson, H., 2005. Active Coevolutionary Learning of Deterministic Finite Automata.