Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation

Daniel Gildea (2003):Loosely Tree-Based Alignment for Machine Translation Linguistics 580(Machine Translation)Scott Drellishak, 2/21/2006

Overview • Gildea presents an alignment model he describes as “loosely tree-based” • Builds on Yamada & Knight (2001), a tree-to-string model • Gildea extends it with a clone operation, and also into a tree-to-tree model • Wants to keep performance reasonable (polynomial in sentence length)

Background • Tree-to-String Model • Tree-to-Tree Model • Experiment

Background • Historically, two approaches to MT: transfer-based and statistical • More recently, though, hybrids • Probabilistic models of structured representations: • Wu (1997) Stochastic Inversion Transduction Grammars • Alshawi et. al. (2000) Head Transducers • Yamada & Knight (2001) (see below)

Gildea’s Proposal • Need to handle drastic changes to trees (real bitexts aren’t isomorphic) • To do this, Gildea adds a new operation to the Y&K’s model: subtree clone • This operation clones a subtree from the source tree to anywhere in the target tree. • Gildea also proposes a tree-to-tree model that uses parallel tree corpora.

Yamada and Knight (2001) • Y&K’s model is tree-to-string: the input is a tree and output is a string of words. • (Gildea compares it to an “Alexander Calder mobile”. He’s the guy who invented that kind of sculpture, which is like Y&K’s model, because each node of the tree can turn either backwards or forwards. Visualize!)

Y&K Tree-to-String Model • Three steps to turn input into output: • Reorder the children of each node (for m nodes, m! orderings; conditioned only on the category of the node and its children) • Optionally insert words at each node either before or after all the children (conditioned only on foreign word) • Translate words at leaves (conditioned on P(f|e); words can translate to NULL)

Aside: Y&K Suitability • Recall that this model was used for translating English to Japanese. • Their model is well-suited to this language pair: • Japanese is SOV, while English is SVO. Japanese is also generally head-last where English is head-first. Reordering handles both of these. • Japanese marks subjects/topics and objects with postpositions. Insertion handles this.

Y&K EM Algorithm • EM algorithm estimates inside probabilities β bottom-up: for all nodes εiin input tree T do for all k, l such that 1 < k < l < N do for all orderings ρof the children ε1… εmof εido for all partitions of span k, l into k1, l1…km, lmdo end for end for end forend for

Y&K Performance • Computation complexity O(|T|Nm+2), where T = tree, N = input length, m = fan-out of the grammar • “By storing partially complete arcs in the chart and interleaving the inner two loops”, improve to O(|T|n3m!2m) • Gildea says “exponential in m” (looks factorial to me) but polynomial in N/n • If |T| is O(n) then the whole thing is O(n4)

Y&K Drawbacks • No alignments with crossing brackets: A B Z X Y • XZY and YZX are impossible • Recall that Y&K flatten trees to avoid some of this, but don’t catch all cases

Adding Clone • Gildea adds clone operation to Y&K’s model • For each node, allow the insertion of a clone of another node as its child. • Probability of cloning εi under εj in two steps: • Choice to insert: • Node to clone: • Pclone is one estimated number, Pmakeclone is constant (all nodes equally probable, reusable)

Tree-to-Tree Model • Output is a tree, not a string, and it must match the tree in the target corpus • Add two new transformation operations: • one source node → two target nodes • two source nodes → one target node • “a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree.”

Calculating Probability • From the root down. At each level: • At most one of node’s children grouped with it, forming an elementary tree (conditioned on current node and CFG rule children) • Alignment of e-tree chosen (conditioned as above). Like Y&K reordering except: (1) alignment can include insertions and deletions (2) two nodes grouped together are reordered together. • Lexical leaves translated as before.

Elementary Trees? • Elementary trees allow the alignment of trees with different depths. Treat A,B as an e-tree, reorder their children together: A A B Z → X Z Y X Y

EM algorithm • Estimates inside probabilities β bottom-up: for all nodes εain source tree Ta in bottom-up order do for all elementary trees ta rooted in εado for all nodes εb in target tree Tb in bottom-up order do for allelementary trees tb rooted in εbdo for all alignments α of the children of ta and tbdo end forend for end for end forend for

Performance • Outer two loops are O(|T|2) • Elementary trees include at most one child, so choosing e-trees is O(m2) • Alignment is O(22m) • Which nodes to insert or clone is O(22m) • How to reorder is O((2m)!) • Overall: O(|T|2m242m(2m)!), quadratic (!) in size of the input sentence.

Tree-to-Tree Clone • Allowing m-to-n matching of up to two nodes (e-trees) allows only “limited non-isomorphism” • So, as before, add a clone operation • Algorithm unchanged, except alignments may now include cloned subtrees, same probability as in tree-to-string (uniform)

The Data • Parallel Korean-English corpus • Trees annotated by hand on both sides • “in this paper we will be using only the Korean trees, modeling their transformation into the English text.” • (That can’t be right—only true for TTS?) • 5083 sentence: 4982 training, 101 eval

Aside: Suitability • Recall that Y&K’s model was suited to the English-to-Japanese task. • Gildea is going to compare their model to his, but using a Korean-English corpus. Is that fair? • In a word, yes. Korean and Japanese are syntactically very similar: agglutinative, head-last (so similar that syntax is the main argument that they’re related).

Results • Alignment Error Rate Och & Ney (2000):

Results Detailed • The lexical probabilities come from Model 1 and node reordering probabilities initialized to uniform • Best results when Pins set to 0.5 rather than estimated (!) • “While the model learned by EM tends to overestimate the total number of aligned word pairs, fixing a higher probability for insertions results in fewer total aligned pairs and therefore a better trade-off between precision and recall”

How’d TTS and TTT Do? • The best results were with tree-to-string, surprisingly • Y&K + clone was ≈ to IBM, fixing Pins was best overall • Tree-to-tree + clone was ≈ to IBM, but it was much more efficient to train (since it’s quadratic instead of quartic) • Still, disappointing results for TTT

Conclusions • Model allows syntactic info to be used for training without ordering constraints • Clone operations improve alignment results • Tree-to-tree + clone is better only in performance (but he’s hopeful) • Future directions: bigger corpora, conditioning on lexicalized trees

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation