340 likes | 471 Views
31 st of October 2005 Seminar IV. Attempts to extend correction queries. Cristina Bibire Research Group on Mathematical Linguistics, Rovira i Virgili University Pl. Imperial Tarraco 1, 43005, Tarragona, Spain E-mail: cristina.bibire@estudiants.urv.es. Correction queries
E N D
31st of October 2005Seminar IV Attempts to extend correction queries Cristina Bibire Research Group on Mathematical Linguistics, Rovira i Virgili University Pl. Imperial Tarraco 1, 43005, Tarragona, Spain E-mail: cristina.bibire@estudiants.urv.es
Correction queries • PAC learning of DFA • Learning CFL • Learning WFA • Redefining the correcting string • References
Learning from corrections The correcting string of s in the language L is the smallest string s' (in lex-length order) such that s.s' belongs to L. The answer to a correction query for a string consists of its correcting string. Myhill-Nerode theorem: The number of states in the smallest DFA accepting L is equal to the number of equivalence classes in .
How can we extend CQ? ? PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the correcting string ? ? ?
PAC learning of DFA with CQ • We assume that there is some probability distribution Pr on the set of all strings over the alphabet Σ and let L be an unknown regular set • The Learner has access to information about L by means of two oracles: • C(x) returns the correcting string for x • Ex( ) is a random sampling oracle that selects a string x from Σ* according to the distribution Pr and returns the pair (x, C(x)). • In addition, the Learner is given the accuracyε and the confidenceδ. • Definition: We say that the language L1 is an ε-approximation of the language L2 provided that: • If A is a DFA, it is said to be an ε-approximation of the set L if L(A) is an ε-approximation of L.
PAC learning of DFA with CQ • If A is an ε-approximation of L, then the probability of finding a discrepancy between L(A) and L with one call of the random sampling oracle Ex( ) is at most ε. • The approximate learner LCAapprox is obtained by modifying LCA. A correction query of the string x is satisfied by a call to C(x). Each conjecture is tested by a number of calls to Ex( ). • If any of the calls to Ex( ) returns a pair (t, C(t)) such that: • - C(t)=λ but A(S,E,C) rejects it or • - C(t)≠λ but A(S,E,C) accepts it • then t is said to be a counterexample and LCAapprox proceeds as LCA • If none of the calls to Ex( ) returns a counterexample, then LCAapprox halts and outputs A(S,E,C)
PAC learning of DFA with CQ • How many calls to Ex( ) does LCAapprox make to test a given conjecture? • accuracy and confidence parameters, ε and δ • how many previous conjectures have been tested • Let • If i previous conjectures have been tested then LCAapprox makes [ri] calls to Ex( ). • Theorem. If n is the number of states in the minimum DFA for the target language L, then LCAapprox terminates after O(n+(1/ε) (ln(1/δ)n+n2)) calls to Ex( ) oracle. Moreover, the probability that the automaton output by LCAapprox is an ε-approximation of L is at least 1-δ.
PAC learning of DFA with CQ • Sketch of the proof: • the total number of counterexamples is at most n-1, so the total number of calls to Ex( ) is at most • the probability that LCAapprox will terminate with an automaton that is not an ε-approximation of L is:
How can we extend CQ? √ ? PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the correcting string ? ? ?
Learning CFL • The setting • There is an unknown CFG G in Chomsky normal form. The Learner knows the set T of terminal symbols, the set N of nonterminal symbols and the start symbol S of G. The Teacher is assumed to answer two types of questions: • MEMBER(x,A) – if the string x can be derived from the non-terminal A in the grammar G, the answer is yes; otherwise, it is no • EQUIV(H) – if H is equivalent to G, the answer is yes; otherwise, it replies with a counterexample t.
Learning CFL • The Learner LCF • LCF can explicitly enumerate all the possible productions of G in polynomial time (in |T| and |N|). Initially LCF places all possible productions of G in the hypothesized set of productions P. • The main loop of LCF asks an EQUIV(H) question for the grammar H=(T,N,S,P). • if H is equivalent to G, then LCF halts and outputs H • otherwise, it “diagnoses” the counterexample t returned, which results in removing at least one production from P; the main loop is then repeated.
How can we extend CQ? √ PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the correcting string ? ? ?
Learning WFA Let be a field and be a function. Associate with an infinite matrix with rows indexed by strings in and columns indexed by strings in . The entry of contains the value f(x.y). The function is called a power series and its Hankel matrix. If we have an WFA A we can associate a function and vice versa, for every function there exists a smallest WFA A such that . Theorem [Carlyle, Paz 1971] Let such that and let F be the corresponding Hankel matrix. Then, the size r of the smallest WFA A such that satisfies r=rank(F).
Learning WFA • Let f be a target function. The learning algorithm may ask the oracle two types of query: • EQ(h): if h is equivalent to f on all input assignments then the answer to the query is yes; otherwise, the answer is no and it receives a counterexample z ( ). • MQ(z): the oracle has to return f(z) • The algorithm learns a function f using its Hankel matrix, F. Because of the mentioned theorem, it is enough to keep a sub-matrix of F of full rank. Therefore the learning algorithm can be viewed as a search for appropriate r rows and r columns.
Learning WFA • The algorithm • Initialize: • Define a hypothesis h • Let • For every , define a matrix such that • For every , define • Ask an equivalence query EQ(h) • If the answer is yes, halt and output h • Otherwise, the answer is no and we receive a counterexample z • Using MQ find a string w.σ, prefix of z such that • (a) • (b) • Go to (2)
How can we extend CQ? √ PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the CQ ? ?
Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
q3 q2 0 q0 q1 0 1 1 1 1 0 0 Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
0, 1 0, 1 q2 q3 q2 0 q0 q1 q0 q1 0, 1 0 1 1 1 1 0 0 Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
q3 q2 0 q0 q1 0 1 1 1 1 0 0 Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
q2 q2 q3 0 q0 q1 0 1 1 1 1 0 0 0, 1 1 q0 q1 0, 1 0 Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.
q3 q2 0 q0 q1 0 1 1 1 1 0 0 Redefining the correcting string • Hamming distance
1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Hamming distance
1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Hamming distance
1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Hamming distance
Redefining the correcting string • Levenshtein (or edit) distance. It counts also when one has a character whereas the other does not. • For two characters a and b, define: • Assume we are given two strings s and t of length n and m, respectively. We are going to fill an (n+1)×(m+1) array d with integers such that the low right corner element d(n+1, m+1) will furnish the required values of the Levenshtein distance Lev(s, t). • The definition of entries of d is recursive. • First set and • For other pairs i, j use
1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Levenshtein distance
1 1 1 q2 q2 0 0 q0 q1 0 1 1 1 0 0 q0 q1 0 Redefining the correcting string • Levenshtein distance
1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Levenshtein distance
How can we extend CQ? √ PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the correcting string ?
References • D. Agluin. Learning Regular Sets from Queries and Counter-examples. Information and Computation 75, 87-106 (1987) • L. Lee. Learning of Context-Free Languages: A Survey of the Literature. Harvard University Technical Report TR-12-1996 (written in 1994) • C. de la Higuera. Learning Stochastic Finite Automata from Experts. In Proceedings of the 4th International Colloquium on Grammatical Inference, Lecture Notes In Computer Science 1433,79-89 (1998) • F. Bergadano, N. Bshouty, A. Beimel, E. Kushilevitz and S. Varricchio. Learning Functions Represented as Multiplicity Automata. Journal of the ACM 47, 506-530 (2000) • http://www.cut-the-knot.org/do_you_know/Strings.shtml