120 likes | 181 Views
This workshop review discusses different approaches in cross-lingual parsing and decoding focusing on input Chinese sentences with English lattices. It covers various models, algorithms, and techniques used by different institutions like JHU, ITC-irst, ATR, and others. The review highlights the importance of linguistic knowledge, morphology, NE, syntax, and modeling phrase movement for better translation accuracy. It also discusses the integration of alignment templates, word reordering, and LM adaptation to enhance translation quality.
E N D
Using Syntax? • ISI-small: • Cross-lingual parsing/decoding • Input: Chinese sentence + English lattice built with all possible phrase substitutions • Output English parse tree • Algorithm: lattice parsing (?) • JHU: • Failed to incorporate “linguistic knowledge” • Morphology, NE, Syntax • Modeling phrase movement did not help
ITC-irst • ITC-irst: very similar to Franz’s system • Log-linear model / minimum error training • Phrase based model • Preprocessing • Chinese numberical translation, segmentation, split of long sentence (testing) • LM Adaptation: mixture of LM from different copora
JHU • Alignment template + WFST • Multiple phrase segmentation of source sentence is essential in translation • BiText chunking(DP, and Divisive Clustering): similar idea as our sentence splitting • Phrase-level movement • Document-specific LM (LM adaptation) • Gains from Doc-specific LMs and BMR-Bleu are not additive
ATR • ATR: • unsupervised Chinese word segmenter • Truecasing by Conditional Random Field
IBM • Word reordering • Pre-ordering: reorder the source sentence 10% improvement • Word-level and block level reordering
ISI • Franz system: • Log linear model • Alignment template • Discriminative training • DP search • New feature functions • Lexicalized reordering +1% Bleu • Penalize word deletions +2% Bleu • Tight integration of rule-based translations +2% • Translation Components: numbers, NE, dates • Train classifier to identify where TC works where
ISI • Franz: important things in system developing • Good engineering is important • Scalability • Efficiency • No bugs in software • Good overall system architecture • Error analysis should drive research • Step1: what is the major error in current system • Step2: fix it! • Step3: goto step 1
Comparable Corpora • ISI: • Arabic: 99M->106M; Bleu: 43.8->42.99 • Chinese: 168M->176M; Bleu: 32.05->32.85
Confidence Intervals • Bootstrapping • IBM’s method • Chop the test data into 50 pieces • NIST’s method • Sign test
New Players • BYU: simple transfer system • Linear B: human post edit MT hypothesis (HAMT) • MTM linguaSoft: based on CIMOS rule based system • NTT: WFST based decoder