470 likes | 692 Views
Learning transducers. With the help of Jose Oncina (Universidad de Alicante ), Hasan Ibne Akram (TUM), Achilles Beros (U. Colorado). Outline. What do we want transducers for? What are transducers ? Basic operations ( onwarding , pushing back) OSTIA OSTIA N and OSTIA D
E N D
Learning transducers With the help of Jose Oncina (Universidad de Alicante), Hasan IbneAkram (TUM), Achilles Beros (U. Colorado)
Outline What do wewanttransducers for? What are transducers? Basic operations (onwarding, pushing back) OSTIA OSTIA N and OSTIA D APTI 1 APTI 2 About semi-deterministicautomata
Herewe have someexamplescomingfrom Richard Sproat’sslides Whats do wewanttransducers for?
Transductions Sequentialtransducers Subsequentialtransduders Total and partial functions Sometheory
In a deterministic setting About Parsing
The Model Probabilistic Subsequential Transducers Why learning Psts? Interesting trade-off between expressive power and computational (parsing) complexity.
ProblemStatements Akram et al. presented an algorithm to learn Psts from positive data and using probabilistic queries (Akram, Higuera, and Eckert 2012). Can we overcome the limitation of a theoretical oracle and learn Psts from positive data only? Can we overcome the limitation of total function of Ostia (Oncina, Garcia and Vidal 1993)?
The Learning Settings An empirical distribution of positive data or examples of translation pairs is given as training data. Table: Training data. input output frequency b y 500 abxy 160 ba x 120 aabxxy 50 aba xyx 40 Total 870
The Canonical Normal Form • A Pst T = Q, Σ ∪ { }, Ω, {q0}, E issaid to be in onwardform if the followingpropertyholds: • {o [e]} • ∀q ∈ Q\{q0 }, lcp • = . • e∈E [q] • The quotient (u, v)−1 R where u ∈ Σ ∗ and v ∈ Ω ∗ is the stochastic set thatobeys the followingproperties: • PrR (uw ,vw ) • PrR (uΣ*, vΩ* ) , • 1 Pr(u,v)−1 R (w , w ) = • 2 (u, v)−1 R = {(w , w )|(uw , vw ) ∈ R, v = lcp({q|(uΣ*, q) ∈ R})}.
# : ⅛ #: ¼ b : ¼ #: ¼ b : ½ : ¾ b : ⅛ b : ¾ Probabilistic Finite (state) transducer. What is the best translation of bb#? with probability ¾ ¾ ½ ¼ + ¾ ¾ ¼ ⅛ =45/512
# : ⅛ b : ¼ #: ¼ b : ½ #: ¼ b : ⅛ : ¾ b : ¾ ¾ : ¾ ¾ b : ¾ ½ b : ½ ¼ b : ¼ ⅛ b : ⅛ ¼ #: ¼ #: ⅛ ⅛ 1
We have to learn from a set of input-output pairs Our goal is to find a model that could explain the data We are going to review an algorithm that is able to identify in the limit any (total) subsequential function
Rational Transducers Definition(Transduction) • A transduction fromX* to Y* isa relation t⊆(X×Y). Definition (Rational Transducer) • A rational transducerisdefinedas a 5-tuple (Q, X, Y, q0, E): • Q: finite set of states • X, Y : input and output alphabets • q0 ∈ Q: initial state • E ⊂ (Q × X*× Y* × Q): Finite set of transitions
Definition (Transduction) We will say that the string y1…ynis a transduction of the string x1…xnif the transitions (edges) (q0, x1, y1, q1),..., (qn, xn, yn, qn) exist in E
Sequential transducers • A sequential transducer is a rational transducer such that the set of transitions E ⊂ Q × X × Y*× Q and • ∀(q, a, u, r), (q, a, v, s) ∈ E ⇒ u = v ∧ r = s. • The transduction produced by a sequential transducer is a partial function t : X* → Y*. • The sequential transductions preserve the prefixes, that is: • t(λ) = λ and if t(uv) exists then t(u) ∈ Pref(uv). • A consequence of the last property is that not all finite transductions can be produced by a sequential transducer
SubsequentialTransducers A Subsequential Transducer is a 6-tuple (Q, X, Y, q0, E, σ) such that (Q, X, Y, q0, E) is a Sequential Transducer and σ : Q → Y* is a partial function The transduction t: X*→Y* is such that t(x)=t’(x)σ(q) where t’(x) is the transduction produced by the sequential transducer and q is the state reached with the input string x We are going to represent by T(q) the set of all the transductions realized by the transducer supposing that q is the initial state.
What is a subsequential transduction? • Intuitively: Any transduction that can be produced using a finite amount of memory • Example 1 (Subsequential) • number written in English→number written in Roman • “two hundred and twenty two”→“CCXXII” • Example 2 (Not Subsequential) • string→thereverse string • “abcde”→“edcba”
Subsequential Transductions • The subsequential transductions are the transductions that can be produced by a subsequential transducer • Example: • T(an) = • bnbif n is odd • bncif n is even
The # trick a: a: aa: c : a: bb a: bb #: c #: #: a: a aa a: bb a: bb
OSTIA The OSTIA (Onward Subsequential Tree Inference Algorithm) allows the identification in the limit of any subsequential transduction It is a merging state method It is easy to modify it in order to use additional information
Onward Transducer • A transducer is Onward if the output is assigned to the transitions such that it is produced as soon as we have enough information • Theorem • For any subsequential transduction the onward subsequential transducer with a minimum number of states is unique up to isomorphism
PrefixTreeAcceptorTransducer Theorem Any(univalued) finite set of input-output pairs T ⊂ X*×Y* canbeproducedby a PrefixTreeAcceptorTransducer (Q, X, Y, q0, E, σ)where: • Q = ∪(u,v )∈T Pr(u) • q0 = λ • E = {(w, a, λ, wa)|w, wa ∈ Q} • σ(u) = {v |(u, v) ∈ T } • It isveryeasy to build an OnwardPrefixTreeAcceptorTransducerfor non-onwardprefixtreeacceptortransducer.
Definition (procedurermerge) functionrmerge(, p, q, K ) = merge(, p, q) while ∃(r, a, v, q), (r, a, u, p) ∈ E, p K do if (v≠u∧ a=#) ∨ (q∈K∧ vpref(u)) then return error else w = lcp{v, u} replace (q, b, v’, s) by (q, b, w−1v v’, s) replace (p, b, u’, s) by (q, b, w−1uu’, s) replace (r, a, v, q) and (r, a, u, p) by (r, a, w, q)
OSTIA (Oncina,91) PPT(T) K {λ}; F = {q : (λ, a, v, q) ∈ E} while F ≠∅ do extract q from F if ∃p ∈ K : rmerge(, p, q, K ) ≠error thenrmerge(, p, q, K) else K K ∪ {q} F {p : (q, a, v, p) ∈ E, q ∈ K} − K
Properties • It identifies in the limit any (total) subsequential transduction • The complexity is O(n3(m+k )+nmk) where • n is the sum of the input string lengths, • m is the length of the longest output string and • k is the alphabet size • In practice the behaviour is linear • It is easy to modify in order to use additional information (i.e. knowledge of the domain, range or negative samples).
A run #: c #: #: a: a aa a: bb a: bb let T be a finite set of input–output pairs. T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} The target:
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} #: c #: #: a: a aa a: bb a: bb
Inferring Partial SubsequentialFunctions To identify subsequential partial functions we need some additional information. For example, in the following transducer the states qaand qbcannot be distinguished by any transduction.
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} aa# aaa# aaaa# a# # #: #: #: #: #: c a aaa aa aaaa a: c a: bb a: a: bb
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Mergeqwithq#. OK aa# aaa# aaaa# a# #: #: #: #: #: c a aaa aa aaaa a: c a: bb a: a: bb
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Mergeqawithq. Requirespushing back bb Henceqaispromoted aa# aaa# aaaa# a# #: #: #: #: #: c a aaa aa aaaa a: c a: bb a: a: bb
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Try to mergeqaawithq. But impossible because of (q, #, , q) and (qaa, #, c, qaa#) aa# aaa# aaaa# #: #: #: #: c #: a aaa aa aaaa a: c a: bb a: a: bb
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Try to mergeqaawithqa. But impossible because of (qa,#, , q) and (qaa, #, c, qaa#) aa# aaa# aaaa# #: #: #: #: c #: a aaa aa aaaa a: c a: bb a: a: bb
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} qaaispromoted aa# aaa# aaaa# #: #: #: #: c #: a aaa aa aaaa a: c a: bb a: a: bb
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Mergeqaa#withq. aaa# #: c aaaa# #: #: #: #: a aaa aa aaaa a: c a: bb a: a: bb
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Try to mergeqaaawithq. This failsbecause no pushing back (q,a, bb, q) and (qaaa,a, c, qaaaa) willwork aaa# #: c aaaa# #: #: #: #: a aaa aa aaaa a: c a: bb a: a: bb
T = {(λ, λ), (a, bb), (aa, bbc), (aaa, bbbb), (aaaa, bbbbc)} Try to mergeqaaawithqa. This works if the cispushed back aaa# #: c aaaa# #: #: #: #: a aaa aa aaaa a: c a: bb a: a: bb aaa# #: c aaaa# #: #: #: c #: a aaa aa aaaa a: a: bb a: a: bb c ispushed back