280 likes | 299 Views
Training Tree Transducers. Author: Jonathan Graehl Kevin Knight Presented by Zhengbo Zhou. Outline . Finite State Transducers (FSTs) and R Trees and Regular Tree Grammars xR and Derivation Tree Inside-Outside algorithm and EM training Turning tree to string (xRS)
E N D
Training Tree Transducers Author: Jonathan Graehl Kevin Knight Presented by Zhengbo Zhou
Outline • Finite State Transducers (FSTs) and R • Trees and Regular Tree Grammars • xR and Derivation Tree • Inside-Outside algorithm and EM training • Turning tree to string (xRS) • Example and Related Work • My thought/questions
b:y a:x q0 q1 Finite State Transducers (FSTs) • Finite-state Transducer: from what we’ve learned->
R transducer • An R transducer compactly represent a potentially infinite set of input/output tree pairs. • While a FST compactly represent such a set of input/output string pairs. • R is a generalization of FST.
S PRO VP he V NP drinks water Example of R He drinks water
q S • S • S PRO VP qleft.vp.v VP qleft.vp.v VP qpro PRO qpro PRO qright.vp.np VP qright.vp.np VP he V V PRO NP NP drinks water Example for R cont Rule 1: Rule: 2,3,4 English order S(PRO, VP(V, NP)) Arabic order S(V,PRO,NP)
Trees • Definitions:
Regular Tree Grammars (RTG) • Regular Tree Grammar, a common way of compactly representing a potentially infinite set of trees. • wRTG is just like WFSA. • wRTG G : (∑,N,S,P) ∑: alphabet N: nonterminals S: start nonterminal : Weighted productions
Extended-LHS Tree Transducer (xR) • Different from R: explicitly represent the lookahead and movement with a more specified LHS • Form of LHS is: The pattern will be used to match an input subtree. • There is a set of finite tree patterns.
Derivation Tree • So many trees now, but this derivation tree is a representation of the transducer, neither the input tree nor the output tree. • But derivation tree can deterministically produce a single weighted output tree.
Inside-Outside algorithm • Basic idea of inside-outside algorithm: Use current probability of rules to estimate the expected frequencies of certain types of derivation steps and compute new probabilities for those rules.[1] • Generally for inside probability is to recalculate p of A->a may go through A->BC for outside probability is to recalculate p of C->AB or C->BA
Inside-Outside for wRTG • Inside weights using G are given by βG: • Outside weights αG:
EM training • EM training: to maximized the corpus likelihood, repeatedly estimating the expectation of decision and maximizing by assigning counts to parameter and renormaliztion. • Algorithm 2 implements EM xR training by repeatedly computing inside-outside weights.
From tree to string • Although we can use Extended-LHS Tree Transducer (xR) to get an output tree from an input tree (say parse trees), but still, it is a (parse) tree, not the sentence in another language (for machine translation). • Now we have xRS—tree to string transducer.
Tree-to-string transducer • Weighted extended-lhs root-to-frontier tree-to-string transducer: X=(∑,Δ,Q, Qi, R) • It is similar to xR, but the rhs is strings instead of trees.
Example • Implemented the translation model of (Yamada and Knight 2001) • There is a trainable xRS tree-to-string transducer that embodies:
Related Work • TSG vs RTG (equivalent) • xR vs weighted synchronous TSG (similar) • EM training vs forward backward algorithm for finite state (string) transducer and also for HMM
Questions • Is there any future work on this tree transducer especially for Machine Translation? • Precision? Recall? • Also a little bit confused in the descriptions of those two relationships =>x and =>G • Not very sure about inside-outside algorithm. Questions?
Reference • 1 Fernando Pereira, Yves Schabes INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORA 1992
What might be useful • An Overview of Probabilistic Tree Transducers for Natural Language Processing Kevin Knight and Jonathan Graehl
– R: Top-down transducer, introduced before. • – F: Bottom-up transducer (“Frontier-to-root”), with similar rules, but transforming the leaves of the input tree first, and working its way up. • – L: Linear transducer, which prohibits copying subtrees. Rule 4 in Figure 4 is example of a copying production, so this whole transducer is R but not RL. • – N: Non-deleting transducer, which requires that every left-hand-side variable also appear on the right-hand side. A deleting R-transducer can simply delete a subtree (without inspecting it). The transducer in Figure 4 is the deleting kind, because of rules 34-39. It would also be deleting if it included a rule for dropping English determiners, e.g., q NP(x0, x1) q x1. • – D: Deterministic transducer, with a maximum of one production per <state, symbol> pair. • – T: Total transducer, with a minimum of one production per <state, symbol> pair. • – PDTT: Push-down tree transducer, the transducer analog of CFTG [36]. • – subscript: Regular-lookahead transducer, which can check to see if an input subtree is tree-regular, i.e., whether it belongs to a specified RTL. Productions only fire when their lookahead conditions are met.