250 likes | 840 Views
Stochastic Inversion Transduction Grammars Dekai Wu. 11-734 Advanced Machine Translation Seminar Presented by: Sanjika Hewavitharana 04/13/2006. Overview. Simple Transduction Grammars Inversion Transduction Grammars (ITGs) Stochastic ITGs Parsing with SITGs Applications of SITGs
E N D
Stochastic Inversion Transduction Grammars Dekai Wu 11-734 Advanced Machine Translation Seminar Presented by: Sanjika Hewavitharana 04/13/2006
Overview • Simple Transduction Grammars • Inversion Transduction Grammars (ITGs) • Stochastic ITGs • Parsing with SITGs • Applications of SITGs • Main Reading: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora (1997)
Introduction • Mathematical models of translation • IBM Models (Brown et al.): String generates String • Syntax based (Yamada & Kenji): Tree generates String • ITG (Wu): two trees are generated simultaneously • ITGs • A formalism for modeling bilingual sentence pairs • Not intended to use as full translation models, but to use for parallel corpus analysis • Extract useful structures from input data • Generative view rather than translation view • two output trees are generated simultaneously, one for each language
Transduction Grammars • A simple transduction grammar is a CFG whose terminals are pairs of symbols (or singletons) • Can be used to model the generation of bilingual sentence pairs E The Financial Secretary and I will be accountable. C
Transduction Grammar Rules E.g. • Simple Rules: • Inversion Rule:
Transduction Grammars • A simple transduction grammar is a CFG whose terminals are pairs of symbols (or singletons) • Can be used to model the generation of bilingual sentence pairs E C
Transduction Grammars • In general, they are not very useful • two languages should share exactly the same grammatical structure • So some sentence pairs cannot be generated • ITG removes the rigid parallel ordering constraint • Constituent order in one language may be the inverse of the other language • Order is the same for both (square brackets): • Order is inverted for one (angle brackets):
ITGs • e.g. • With ITG we can parse the previous sentence pair • Inversion rule: VP VV PP
Expressiveness of ITGs • Not all matching are possible with ITG • e.g. ‘Inside-out’ matching are not allowed • This helps to reduce the combinatorial growth of matchings with the number of tokens • The number of matchings eliminated increases rapidly as the number of tokens increases • Author claims this is a benefit
Normal Form of ITG • For any ITG there exists an equivalent grammar in the normal form • Right hand side of all rules have either: • Terminal couples • Terminal singletons • Pairs of non-terminals with straight orientation • Pairs of non-terminals with inverted orientation
Stochastic ITGs • A probability can be assigned to each rewrite rule • The probabilities of all the rules with a given left hand side must sum to 1. • An SITG will give the most probable matching (ML) parse for a sentence pair. • Similar to Viterbi or CYK (Chart) parsing
Parsing with SITGs • Every node (q) in the parse tree has 5 elements: • Begin & end indices for language-1 string (s,t) • Begin & end indices for language-2 string (u,v) • Non-terminal category (i) • Each cell (in the chart) stores the probability of the most likely parse covering the appropriate substrings, rooted in the appropriate category
Parsing with SITGs - Algorithm • Initialize the cells corresponding to terminals using a translation lexicon • For the other cells, recursively find the most probable way of obtaining that nonterminal category. • Compute the probability by multiplying the probability of the rule by the probabilities of both the constituents • Store that probability plus the orientation of the rule • Complexity: O(n3m3)
Applications of SITGs • Segmentation • Bracketing • Alignment • Bilingual Constraint Transfer • Mining parallel sentences from comparable corpora [Wu & Fung 2005]
Applications of SITGs - Segmentation • Word boundaries are not marked in Chinese text • No word chunks available for matching • One option : do word segmentation as preprocessing • Might produce chunks with that does not agree bilingually • Solution: extend the algorithm to accommodate segmentation • Allow the initialization step to find strings of any length in the translation lexicon • The recursive step stores the most probable way of creating a constituent, whether it came from the lexicon or from rules
Applications of SITGs – Bracketing • How to assign structure to a sentence with no grammar available? • Especially problematic for minority language • A solution using ITGs: • Get a parallel corpus pairing it with some other language • Get a reasonable translation dictionary • Parse it with a bracketing transduction grammar
Bracketing Transduction Grammar • A minimal ITG • Only one nonterminal: A • Production rules: • Lexical translation probabilities has prominence • Small prob. values for the two singleton production rules • Also, a very small value for
Bracketing with Singletons • Singletons cause bracketing errors • Some refinements: • Depending on the language, bias the singletons attachment either to the left or the right of a constituent • Apply a series of transformations which would push the singletons as closely as possible towards couples e.g. [ xA B ] ⇌xA B ⇌ x A B⇌ [x A ] B • Before: • After:
Bracketing Experiments • Used 2000 Chinese-English sentence-pairs from HKUST corpus • Some filtering: • Remove sentence pairs that were not adequately covered by the lexicon (>1 unknown words) • Remove sentence pairs with high unmatched words (>2) • Bracketing precision: • 80% for English • 78% for Chinese • Errors mainly due to lexical imperfections • A statistical lexicon (~6.5k English, ~5.5k Chinese words) • Can be improved with extra information • e.g. POS, grammar-based bracketer
Applications of SITGs - Alignment • Alignments (phrasal or word) are a natural byproduct of bilingual parsing • Unlike ‘parse-parse-match’ methods, this • Doesn’t require a robust grammar for both languages • Guarantees compatibility between parses • Has a principled way of choosing between possible alignments • Provides a more reasonable ‘distortion penalty’ • Recent empirical studies show ITGs produce better alignments in various applications [Wu & Fung 2005]
Bilingual Constraint Transfer • A high-quality parse for one language can be leveraged to get structure for the other • Alter the parsing algorithm: • only allow constituents that match the parse that already exists for the well-studied language • This works for any sort of constraint supplied for the well-studied language
References: • Dekai Wu (1997), Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora, Computational Linguistics, Vol. 23, no. 1, pp. 377-403. • Dekai Wu (1995), Grammarless Extraction of Phrasal Translation Examples from Parallel Texts, 6th Intl. Conf.on Theoretical and Methodological Issues in Machine Translation, Vol. 2, pp. 354-372. Leuven, Belgium. • Dekai Wu and Pascale FUNG (2005), Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora, 2nd Intl. Joint Conf. on Natural Language Processing (IJCNLP-2005), Jeju, Korea, October.