Training Tree Transducers

Training Tree Transducers Author: Jonathan Graehl Kevin Knight Presented by Zhengbo Zhou

Outline • Finite State Transducers (FSTs) and R • Trees and Regular Tree Grammars • xR and Derivation Tree • Inside-Outside algorithm and EM training • Turning tree to string (xRS) • Example and Related Work • My thought/questions

b:y a:x q0 q1 Finite State Transducers (FSTs) • Finite-state Transducer: from what we’ve learned->

R transducer • An R transducer compactly represent a potentially infinite set of input/output tree pairs. • While a FST compactly represent such a set of input/output string pairs. • R is a generalization of FST.

S PRO VP he V NP drinks water Example of R He drinks water

q S • S • S PRO VP qleft.vp.v VP qleft.vp.v VP qpro PRO qpro PRO qright.vp.np VP qright.vp.np VP he V V PRO NP NP drinks water Example for R cont Rule 1: Rule: 2,3,4 English order S(PRO, VP(V, NP)) Arabic order S(V,PRO,NP)

Trees • Definitions:

Regular Tree Grammars (RTG) • Regular Tree Grammar, a common way of compactly representing a potentially infinite set of trees. • wRTG is just like WFSA. • wRTG G : (∑,N,S,P) ∑: alphabet N: nonterminals S: start nonterminal : Weighted productions

Sample wRTG

Extended-LHS Tree Transducer (xR) • Different from R: explicitly represent the lookahead and movement with a more specified LHS • Form of LHS is: The pattern will be used to match an input subtree. • There is a set of finite tree patterns.

Binary Relation:

Derivation Tree • So many trees now, but this derivation tree is a representation of the transducer, neither the input tree nor the output tree. • But derivation tree can deterministically produce a single weighted output tree.

Derivation tree & derivation wRTG X X’

Inside-Outside algorithm • Basic idea of inside-outside algorithm: Use current probability of rules to estimate the expected frequencies of certain types of derivation steps and compute new probabilities for those rules.[1] • Generally for inside probability is to recalculate p of A->a may go through A->BC for outside probability is to recalculate p of C->AB or C->BA

Inside-Outside for wRTG • Inside weights using G are given by βG: • Outside weights αG:

EM training • EM training: to maximized the corpus likelihood, repeatedly estimating the expectation of decision and maximizing by assigning counts to parameter and renormaliztion. • Algorithm 2 implements EM xR training by repeatedly computing inside-outside weights.

From tree to string • Although we can use Extended-LHS Tree Transducer (xR) to get an output tree from an input tree (say parse trees), but still, it is a (parse) tree, not the sentence in another language (for machine translation). • Now we have xRS—tree to string transducer.

Tree-to-string transducer • Weighted extended-lhs root-to-frontier tree-to-string transducer: X=(∑,Δ,Q, Qi, R) • It is similar to xR, but the rhs is strings instead of trees.

Example • Implemented the translation model of (Yamada and Knight 2001) • There is a trainable xRS tree-to-string transducer that embodies:

Example

Related Work • TSG vs RTG (equivalent) • xR vs weighted synchronous TSG (similar) • EM training vs forward backward algorithm for finite state (string) transducer and also for HMM

Questions • Is there any future work on this tree transducer especially for Machine Translation? • Precision? Recall? • Also a little bit confused in the descriptions of those two relationships =>x and =>G • Not very sure about inside-outside algorithm. Questions?

Thank you!!

Reference • 1 Fernando Pereira, Yves Schabes INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORA 1992

What might be useful • An Overview of Probabilistic Tree Transducers for Natural Language Processing Kevin Knight and Jonathan Graehl

– R: Top-down transducer, introduced before. • – F: Bottom-up transducer (“Frontier-to-root”), with similar rules, but transforming the leaves of the input tree first, and working its way up. • – L: Linear transducer, which prohibits copying subtrees. Rule 4 in Figure 4 is example of a copying production, so this whole transducer is R but not RL. • – N: Non-deleting transducer, which requires that every left-hand-side variable also appear on the right-hand side. A deleting R-transducer can simply delete a subtree (without inspecting it). The transducer in Figure 4 is the deleting kind, because of rules 34-39. It would also be deleting if it included a rule for dropping English determiners, e.g., q NP(x0, x1) q x1. • – D: Deterministic transducer, with a maximum of one production per <state, symbol> pair. • – T: Total transducer, with a minimum of one production per <state, symbol> pair. • – PDTT: Push-down tree transducer, the transducer analog of CFTG [36]. • – subscript: Regular-lookahead transducer, which can check to see if an input subtree is tree-regular, i.e., whether it belongs to a specified RTL. Productions only fire when their lookahead conditions are met.

Training Tree Transducers

Training Tree Transducers

Presentation Transcript

Transducers

Pressure Transducers

DIGITAL TRANSDUCERS

Transducers

Transducers

Transducers

Transducers

Higher-Order Tree Transducers and Their Expressive Power

Transducers

Streaming Tree Transducers

THE COMPLEXITY OF TRANSLATION MEMBERSHIP FOR MACRO TREE TRANSDUCERS

TRANSDUCERS

TRANSDUCERS

Training Tree

Multi-Return Macro Tree Transducers

Learning transducers

ultrasound Transducers

Transducers

TRANSDUCERS

TRANSDUCERS