1 / 67

Learning from Text

Learning from Text. Colin de la Higuera University of Nantes. Outline. Motivations, definition and difficulties Some negative results Learning k -testable languages from text Learning k -reversible languages from text Conclusions. http://pagesperso.lina.univ-nantes.fr/~cdlh/slides/

kaida
Download Presentation

Learning from Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning from Text Colin de la Higuera University of Nantes

  2. Outline • Motivations, definition and difficulties • Some negative results • Learning k-testable languages from text • Learning k-reversible languages from text • Conclusions http://pagesperso.lina.univ-nantes.fr/~cdlh/slides/ Chapters 8 and 11

  3. 1 Identification in the limit yields A class of languages L Pres  ℕX a L A learner The naming function G A class of grammars L(a())=yields() (ℕ)=(ℕ) yields()=yields()

  4. Learning from text • Only positive examples are available • Danger of over-generalization: why not return *? • The problem is “basic”: • Negative examples might not be available • Or they might be heavily biased: near-misses, absurd examples… • Base line: all the rest is learning with help

  5. GI as a search problem PTA ? 

  6. From the sample to the PTA PTA(S+) 7 a 11 a a 4 8 b 2 a b a 14 a b 9 12 5 1 a 15 b 13 3 b a a 10 6 S+={, aaa, aaba, ababa, bb, bbaaa}

  7. Unanswered Questions • Data is unlabelled… • Is this a clustering problem? • Is this a problem posed in other settings?

  8. 2 The theory • Gold 67: No super-finite class can be identified from positive examples (or text) only • Necessary and sufficient conditions for learning exist • Literature: • inductive inference, • ALT series, …

  9. Limit point • A class L of languages has a limit pointif there exists an infinite sequence Lnnℕ of languages in L such that L0  L1  … Lk  …, and there exists another language LL such that L= nℕLn • L is called a limit point of L

  10. L is a limit point L0 L1 Li L2 L L3

  11. Theorem If L admits a limit point, then L is not learnable from text Proof: Letsibe a presentation in length-lex order for Li,and s be a presentation in length-lex order for L. Thennℕi: knsik= sk Note: having a limit point is a sufficient condition for non learnability; not a necessary condition

  12. Mincons classes • A class is mincons if there is an algorithm which, given a sample S, builds a grammar GG such that S L L(G) L =L(G) • Ie there is a unique minimum (for inclusion) consistent grammar

  13. Accumulation point (Kapur 91) A class L of languages has an accumulation pointif there exists an infinite sequence Sn nℕ of sets such that S0  S1  … Sn  …, and L= nℕSn  L …and for any nℕ there exists a language Ln’ in L such that Sn  Ln’  L. The language L is called an accumulation point of L

  14. Ln’ S0 S1 S2 S3 L is an accumulation point Sn L

  15. Theorem (for Mincons classes) L admits an accumulation point iff L is not learnable from text

  16. Infinite Elasticity • If a class of languages has a limit point there exists an infinite ascending chain of languages L0 L1  …  Ln  …. • This property is called infinite elasticity

  17. Infinite Elasticity x0 x1 xi Xi+1 Xi+2 Xi+3 Xi+4 x2 x3

  18. Finite elasticity L has finite elasticity if it does not have infinite elasticity

  19. Theorem (Wright) If L(G) has finite elasticity and is mincons, then G is learnable.

  20. L(G’) TG x1 x3 x2 x4 Tell tale sets Forbidden L(G)

  21. Theorem (Angluin) G is learnable iff there is a computable partial function : Gℕ* such that: • nℕ, (G,n) is defined iffGG and L(G) • GG, TM={(G,n): nℕ} is a finite subset of L(G) called a tell-tale subset • G,G’M, if TM L(G’) then L(G’) L(G)

  22. Proposition (Kapur 91) A language L in L has a tell-tale subsetiffL is not an accumulation point. (for mincons)

  23. Summarizing • Many alternative ways of proving that identification in the limit is not feasible • Methodological-philosophical discussion • We still need practical solutions

  24. 3 Learning k-testable languages P. García and E. Vidal. Inference of K-testable languages in the strict sense and applications to syntactic pattern recognition. Pattern Analysis and Machine Intelligence, 12(9):920–925, 1990 P. García, E. Vidal, and J. Oncina. Learning locally testable languages in the strict sense. In Workshop on Algorithmic Learning Theory (ALT 90), pages 325–338, 1990

  25. Definition Let k0, a k-testable language in the strict sense (k-TSS) is a 5-tuple Zk=(, I, F, T, C) with: •  a finite alphabet • I, Fk-1(allowed prefixes of length k-1 and suffixes of length k-1) • Tk (allowed segments) • C<k contains all strings of length less than k • Note that I∩F=C∩Σk-1

  26. The k-testable language is L(Zk)=I*  *F - *(k-T)*C • Strings (of length at least k) have to use a good prefix and a good suffix of length k-1, and all sub-strings have to belong to T. Strings of length less than k should be in C • Or: k-T defines the prohibited segments • Key idea: use a window of size k

  27. a b a a  a b An example (2-testable) I={a} F={a} T={aa, ab, ba} C={,a}

  28. Window language • By sliding a window of size 2 over a string we can parse • ababaaababababaaaa OK • aaabbaaaababab not OK

  29. The hierarchy of k-TSS languages • k-TSS()={L*: L is k-TSS} • All finite languages are in k-TSS() if k is large enough! • k-TSS()  [k+1]-TSS() • (bak)* [k+1]-TSS() • (bak)* k-TSS()

  30. a a a  b b a A language that is not k-testable

  31. K-TSS inference Given a sample S, L(ak-TSS(S))= Zk where Zk=((S), I(S), F(S), T(S), C(S) ) and • (S) is the alphabet used in S • C(S)=(S)<kS • I(S)=(S)k-1Pref(S) • F(S)= (S)k-1Suff(S) • T(S)=(S)k {v: uvwS}

  32. Example • S={a, aa, abba, abbbba} • Let k=3 • (S)={a, b} • I(S)= {aa, ab} • F(S)= {aa, ba} • C(S)= {a , aa} • T(S)={abb, bbb, bba} • L(a3-TSS(S))= ab*a+a

  33. Building the corresponding automaton • Each string in IC and PREF(IC) is a state • Each substring of length k-1 of strings in T is a state •  is the initial state • Add a transition labeled b from u to ub for each state ub • Add a transition labeled b from au to ub for each aub in T • Each state/substring that is in F is a final state • Each state/substring that is in C is a final state

  34. a aa aa a a a I={aa, ab}   b F={aa, ba} ab ab T={abb, bbb, bba} C={a, aa} b bb a bb ba ba b Running the algorithm S={a, aa, abba, abbbba}

  35. Properties (1) • S L(ak-TSS(S)) • L(ak-TSS(S)) is the smallest k-TSS language that contains S • If there is a smaller one, some prefix, suffix or substring has to be absent

  36. Properties (2) • ak-TSS identifies any k-TSS language in the limit from polynomial data • Once all the prefixes, suffixes and substrings have been seen, the correct automaton is returned • If YS, L(ak-TSS(Y)) L(ak-TSS(S))

  37. Properties (3) • L(ak+1-TSS(S)) L(ak-TSS(S)) In Ik+1 (resp. Fk+1 and Tk+1) there are less allowed prefixes (resp. suffixes or substrings) than in Ik (resp. Fk and Tk) • k>maxxSx, L(ak-TSS(S))= S • Because for a large k, Tk(S)=

  38. 4 Learning k-reversible languages from text D. Angluin. Inference of reversible languages. Journal of the Association for Computing Machinery, 29(3):741–765, 1982

  39. The k-reversible languages • The class was proposed by Angluin (1982) • The class is identifiable in the limit from text • The class is composed by regular languages that can be accepted by a DFA such that its reverse is deterministic with a look-ahead of k

  40. Let A=(, Q, , I, F) be a NFA, we denote by AT=(, Q, T, F, I) the reversal automaton with: T(q,a)={q’Q: q(q’,a)}

  41. a b A a 2 b 1 0 a 4 a a 3 a AT b a 2 b 1 0 a 4 a a 3

  42. Some definitions • u is a k-successor of q if │u│=k and (q,u) • u is a k-predecessor of q if │u│=k and T(q,uT) •  is 0-successor and 0-predecessor of any state

  43. a b A a 2 b • aa is a 2-successor of 0 and 1 but not of 3 • a is a 1-successor of 3 • aa is a 2-predecessor of 3 but not of 1 1 0 a 4 a a 3

  44. A NFA is deterministic with look-ahead kifq,q’Q: qq’ (q,q’I)  (q,q’(q”,a))  (u is a k-successor of q)  (v is a k-successor of q’)  uv

  45. Prohibited: u 1 │u│=k a a u 2

  46. Example a b This automaton is not deterministic with look-ahead 1 but is deterministic with look-ahead 2 a 2 b 1 0 a 4 a a 3

  47. K-reversible automata • A is k-reversible if A is deterministic and AT is deterministic with look-ahead k • Example b b a a b a 2 b a 2 1 0 1 0 b b deterministic with look-ahead 1 deterministic

  48. Notations • RL(, k) is the set of all k reversible languages over alphabet  • RL() is the set of all k-reversible languages over alphabet  (ie for all values of k) • ak-RL is the learning algorithm we describe

  49. Properties • There are some regular languages that are not in RL() • RL(,k)RL(,k-1) check

  50. Violation of k-reversibility Two states q, q’ violate the k-reversibility condition if • they violate the deterministic condition: q,q’(q”,a) or • they violate the look-ahead condition: • q,q’F, uk: u is k-predecessor of both q and q’ • uk, (q,a)=(q’,a) and u is k-predecessor of both q and q’

More Related