230 likes | 432 Views
What’s in a translation rule?. Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit. Problem. The problem of syntax in SMT Yamada & Knight (2001) had transformations like child-reorderings Addressed the SOV vs. VSO orders Does not address all the syntactic movements
E N D
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit
Problem • The problem of syntax in SMT • Yamada & Knight (2001) had transformations like child-reorderings • Addressed the SOV vs. VSO orders • Does not address all the syntactic movements • English Adverbs: The government simply says … • ne … pas
Three Alternative • Abandon Syntax • Evidence: Kohn et. Al. 2003 • Abandon English Syntax • Learn grammar from parallel corpus • Wu (1997): ITG: binary branching rules • Use English syntax to learn transformation rules from parallel corpus and larger fragments of the English tree structure.
A Theory of Word Alignment • Generative process • Source string to target tree (symbol tree) • Derivation Step: replaces a substring of the source string with a subtree of the target tree. • Derivation: Sequence DS.
Each source element is replaced at exactly one step of the derivation Each node target tree is created at exactly one step of derivation Replaced(s,D) Replaced (va, D) = 2 Created (t,D) Created (AUX, D) = 3 Replacing and Creating
Word Alignment • Alignment: A relation between leaves of the target tree (t) and elements of the source string (s): • iff Replaced(s,D) = created(t,D)
“Good Derivations” • Input: source string, target tree, word alignments • A set that induces a super alignment set for the given word alignment. • 1 & 3
Derivations Rules • ne VB pas • NP VP • Task: given T, S and A, learn in any • What about inferring complex rules?
Alignment Graph • Target Tree, augmented with the source strings • Span of nodes • Frontier set • Frontier graph fragment: root and all sinks are in the frontier set • Spans of the sinks form a partition of the span of the root.
Alignment Graph • Target Tree, augmented with the source strings • Span of nodes • Frontier set • Frontier graph fragment: root and all sinks are in the frontier set • Spans of the sinks form a partition of the span of the root.
Alignment Graph • Target Tree, augmented with the source strings • Span of nodes • Frontier set • Frontier graph fragment: root and all sinks are in the frontier set • Spans of the sinks form a partition of the span of the root.
Transformation process • Input: Place the sinks in the order defined by the partition. • Output: Replace sink nodes with variable corresponding to the position in input, then take the tree part of the fragment. • These rules are in
Rule Extraction Algorithm • Search the space of graph fragments for frontier graph fragments (FGF). • Search of all fragments is exponential • The frontier set (FS) can be found linearly • For each node (n) in the FS, there is a unique minimal FGF, rooted at n.
Rule Extraction Algorithm • Search the space of graph fragments for frontier graph fragments (FGF). • Search of all fragments is exponential • The frontier set (FS) can be found linearly • For each node (n) in the FS, there is a unique minimal FGF, rooted at n.
Expanding from minimal fragments • Compose new frontier graph fragment by merging to of the minimal fragments
Experiments • French-English (Hansard) • Human alignments • GIZA++ alignments • Chinese-English (FBIS) • GIZA++ alignments (trained on huge corpus) • Issue: Coverage of the extracted rules. • Percentage of the parse trees in the corpus that can be transformed by the translation rules.
Coverage of the model • Number of expansions • Single: Yamada & Knight 2001 • 17 to 43 expansions for full coverage • Alignment • Lang Diffs
Conclusion • Previous works: child-node reordering • This model looks at larger tree fragments • Translation rules are both syntactically and lexically motivated. • The rule extraction algorithm can deal with alignment and systematic parsing errors. • Next step: defining probability distribution over the rules Decoding