1 / 46

Learning transducers

Learning transducers. With the help of Jose Oncina (Universidad de Alicante ), Hasan Ibne Akram (TUM), Achilles Beros (U. Colorado). Outline. What do we want transducers for? What are transducers ? Basic operations ( onwarding , pushing back) OSTIA OSTIA N and OSTIA D

ivor-cross
Download Presentation

Learning transducers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning transducers With the help of Jose Oncina (Universidad de Alicante), Hasan IbneAkram (TUM), Achilles Beros (U. Colorado)

  2. Outline What do wewanttransducers for? What are transducers? Basic operations (onwarding, pushing back) OSTIA OSTIA N and OSTIA D APTI 1 APTI 2 About semi-deterministicautomata

  3. Herewe have someexamplescomingfrom Richard Sproat’sslides Whats do wewanttransducers for?

  4. Transductions Sequentialtransducers Subsequentialtransduders Total and partial functions Sometheory

  5. In a deterministic setting About Parsing

  6. OSTIA

  7. OSTIA-N

  8. OSTIA D

  9. Theseideas in othercontexts

  10. APTI1

  11. The Model Probabilistic Subsequential Transducers Why learning Psts? Interesting trade-off between expressive power and computational (parsing) complexity.

  12. ProblemStatements Akram et al. presented an algorithm to learn Psts from positive data and using probabilistic queries (Akram, Higuera, and Eckert 2012). Can we overcome the limitation of a theoretical oracle and learn Psts from positive data only? Can we overcome the limitation of total function of Ostia (Oncina, Garcia and Vidal 1993)?

  13. The Learning Settings An empirical distribution of positive data or examples of translation pairs is given as training data. Table: Training data. input output frequency b y 500 abxy 160 ba x 120 aabxxy 50 aba xyx 40 Total 870

  14. The Canonical Normal Form • A Pst T = Q, Σ ∪ { }, Ω, {q0}, E issaid to be in onwardform if the followingpropertyholds: • {o [e]} • ∀q ∈ Q\{q0 }, lcp • = . • e∈E [q] • The quotient (u, v)−1 R where u ∈ Σ ∗ and v ∈ Ω ∗ is the stochastic set thatobeys the followingproperties: • PrR (uw ,vw ) • PrR (uΣ*, vΩ* ) , • 1 Pr(u,v)−1 R (w , w ) = • 2 (u, v)−1 R = {(w , w )|(uw , vw ) ∈ R, v = lcp({q|(uΣ*, q) ∈ R})}.

  15. # : ⅛ #:  ¼ b :  ¼ #: ¼ b : ½  :  ¾ b : ⅛ b :  ¾ Probabilistic Finite (state) transducer. What is the best translation of bb#?  with probability ¾ ¾ ½ ¼ + ¾ ¾ ¼ ⅛ =45/512

  16. # : ⅛ b :  ¼ #:  ¼ b : ½ #: ¼ b :  ⅛  :  ¾ b :  ¾  ¾  :  ¾  ¾ b :  ¾ ½ b : ½  ¼ b :  ¼  ⅛ b :  ⅛ ¼ #: ¼ #: ⅛ ⅛ 1

  17. We have to learn from a set of input-output pairs Our goal is to find a model that could explain the data We are going to review an algorithm that is able to identify in the limit any (total) subsequential function

  18. Rational Transducers Definition(Transduction) • A transduction fromX* to Y* isa relation t⊆(X×Y). Definition (Rational Transducer) • A rational transducerisdefinedas a 5-tuple (Q, X, Y, q0, E): • Q: finite set of states • X, Y : input and output alphabets • q0 ∈ Q: initial state • E ⊂ (Q × X*× Y* × Q): Finite set of transitions

  19. Definition (Transduction) We will say that the string y1…ynis a transduction of the string x1…xnif the transitions (edges) (q0, x1, y1, q1),..., (qn, xn, yn, qn) exist in E

  20. Sequential transducers • A sequential transducer is a rational transducer such that the set of transitions E ⊂ Q × X × Y*× Q and • ∀(q, a, u, r), (q, a, v, s) ∈ E ⇒ u = v ∧ r = s. • The transduction produced by a sequential transducer is a partial function t : X* → Y*. • The sequential transductions preserve the prefixes, that is: • t(λ) = λ and if t(uv) exists then t(u) ∈ Pref(uv). • A consequence of the last property is that not all finite transductions can be produced by a sequential transducer

  21. SubsequentialTransducers A Subsequential Transducer is a 6-tuple (Q, X, Y, q0, E, σ) such that (Q, X, Y, q0, E) is a Sequential Transducer and σ : Q → Y* is a partial function The transduction t: X*→Y* is such that t(x)=t’(x)σ(q) where t’(x) is the transduction produced by the sequential transducer and q is the state reached with the input string x We are going to represent by T(q) the set of all the transductions realized by the transducer supposing that q is the initial state.

  22. What is a subsequential transduction? • Intuitively: Any transduction that can be produced using a finite amount of memory • Example 1 (Subsequential) • number written in English→number written in Roman • “two hundred and twenty two”→“CCXXII” • Example 2 (Not Subsequential) • string→thereverse string • “abcde”→“edcba”

  23. Subsequential Transductions • The subsequential transductions are the transductions that can be produced by a subsequential transducer • Example: • T(an) = • bnbif n is odd • bncif n is even

  24. The # trick a:  a: aa: c :  a: bb a: bb #: c #:  #:  a:  a aa  a: bb a: bb

  25. OSTIA The OSTIA (Onward Subsequential Tree Inference Algorithm) allows the identification in the limit of any subsequential transduction It is a merging state method It is easy to modify it in order to use additional information

  26. Onward Transducer • A transducer is Onward if the output is assigned to the transitions such that it is produced as soon as we have enough information • Theorem • For any subsequential transduction the onward subsequential transducer with a minimum number of states is unique up to isomorphism

  27. PrefixTreeAcceptorTransducer Theorem Any(univalued) finite set of input-output pairs T ⊂ X*×Y* canbeproducedby a PrefixTreeAcceptorTransducer (Q, X, Y, q0, E, σ)where: • Q = ∪(u,v )∈T Pr(u) • q0 = λ • E = {(w, a, λ, wa)|w, wa ∈ Q} • σ(u) = {v |(u, v) ∈ T } • It isveryeasy to build an OnwardPrefixTreeAcceptorTransducerfor non-onwardprefixtreeacceptortransducer.

  28. Definition (procedurermerge) functionrmerge(, p, q, K )  = merge(, p, q) while ∃(r, a, v, q), (r, a, u, p) ∈ E, p K do if (v≠u∧ a=#) ∨ (q∈K∧ vpref(u)) then return error else w = lcp{v, u} replace (q, b, v’, s) by (q, b, w−1v v’, s) replace (p, b, u’, s) by (q, b, w−1uu’, s) replace (r, a, v, q) and (r, a, u, p) by (r, a, w, q)

  29. OSTIA (Oncina,91) PPT(T) K {λ}; F = {q : (λ, a, v, q) ∈ E} while F ≠∅ do extract q from F if ∃p ∈ K : rmerge(, p, q, K ) ≠error thenrmerge(, p, q, K) else K K ∪ {q} F {p : (q, a, v, p) ∈ E, q ∈ K} − K

  30. Properties • It identifies in the limit any (total) subsequential transduction • The complexity is O(n3(m+k )+nmk) where • n is the sum of the input string lengths, • m is the length of the longest output string and • k is the alphabet size • In practice the behaviour is linear • It is easy to modify in order to use additional information (i.e. knowledge of the domain, range or negative samples).

  31. A run #: c #:  #:  a:  a aa  a: bb a: bb let T be a finite set of input–output pairs. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} The target:

  32. Build the prefixtreeacceptor

  33. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} #: c #:  #:  a:  a aa  a: bb a: bb

  34. Inferring Partial SubsequentialFunctions To identify subsequential partial functions we need some additional information. For example, in the following transducer the states qaand qbcannot be distinguished by any transduction.

  35. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} aa# aaa# aaaa# a# # #:  #:  #:  #:  #: c a aaa aa aaaa  a: c a: bb a:  a: bb

  36. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Mergeqwithq#. OK aa# aaa# aaaa# a# #:  #:  #:  #:  #: c a aaa aa aaaa  a: c a: bb a:  a: bb

  37. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Mergeqawithq. Requirespushing back bb Henceqaispromoted aa# aaa# aaaa# a# #:  #:  #:  #:  #: c a aaa aa aaaa  a: c a: bb a:  a: bb

  38. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Try to mergeqaawithq. But impossible because of (q, #, , q) and (qaa, #, c, qaa#) aa# aaa# aaaa# #:  #:  #:  #: c #:  a aaa aa aaaa  a: c a: bb a:  a: bb

  39. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Try to mergeqaawithqa. But impossible because of (qa,#, , q) and (qaa, #, c, qaa#) aa# aaa# aaaa# #:  #:  #:  #: c #:  a aaa aa aaaa  a: c a: bb a:  a: bb

  40. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} qaaispromoted aa# aaa# aaaa# #:  #:  #:  #: c #:  a aaa aa aaaa  a: c a: bb a:  a: bb

  41. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Mergeqaa#withq. aaa# #: c aaaa# #:  #:  #:  #:  a aaa aa aaaa  a: c a: bb a:  a: bb

  42. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Try to mergeqaaawithq. This failsbecause no pushing back (q,a, bb, q) and (qaaa,a, c, qaaaa) willwork aaa# #: c aaaa# #:  #:  #:  #:  a aaa aa aaaa  a: c a: bb a:  a: bb

  43. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Try to mergeqaaawithqa. This works if the cispushed back aaa# #: c aaaa# #:  #:  #:  #:  a aaa aa aaaa  a: c a: bb a:  a: bb aaa# #: c aaaa# #:  #:  #: c #:  a aaa aa aaaa  a:  a: bb a:  a: bb c ispushed back

More Related