Course Summary

Course Summary LING 575 Fei Xia 03/06/07

Outline • Introduction to MT: 1 • Major approaches • SMT: 3 • Transfer-based MT: 2 • Hybrid systems: 2 • Other topics

Introduction to MT

Major challenges • Translation is hard. • Getting the right words: • Choosing the correct root form • Getting the correct inflected form • Inserting “spontaneous” words • Putting the words in the correct order: • Word order: SVO vs. SOV, … • Unique constructions: • Divergence

Lexical choice • Homonymy/Polysemy: bank, run • Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, … • Coding (Concept  lexeme mapping) differences: • More distinction in one language: e.g., kinship vocabulary. • Different division of conceptual space:

Major approaches • Transfer-based • Interlingua • Example-based (EBMT) • Statistical MT (SMT) • Hybrid approach

The MT triangle Meaning (interlingua) Synthesis Analysis Transfer-based Phrase-based SMT, EBMT Word-based SMT, EBMT word Word

Comparison of resource requirement

Evaluation • Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT. • Human evaluation: accuracy, fluency, … • Problem: expensive, slow, subjective, non-reusable. • Automatic measures: • Edit distance • Word error rate (WER), Position-independent WER (PER) • Simple string accuracy (SSA), Generation string accuracy (GSA) • BLEU

Major approaches

Word-based SMT • IBM Models 1-5 • Main concepts: • Source channel model • Hidden word alignment • EM training

Source channel model for MT P(E) P(F | E) Fr sent Eng sent Noisy channel • Two types of parameters: • Language model: P(E) • Translation model: P(F | E)

Modeling p(F | E) with alignment

Modeling Model 1: Model 2: • Parameters: • Length prob: P(m | l) • Translation prob: t(fj | ei) • Distortion prob (for Model 2): d(i | j, m, l)

Training • Model 1:

Finding the best alignment Given E and F, we are looking for Model 1:

Clump-based SMT • The unit of translation is a clump. • Training stage: • Word alignment • Extracting clump pairs • Decoding stage: • Try all segmentations of the src sent and all the allowed permutations • For each src clump, try TopN tgt clumps • Prune the hypotheses

Transfer-based MT • Analysis, transfer, generation: • Example: (Quirk et al., 2005) • Parse the source sentence • Transform the parse tree with transfer rules • Translate source words • Get the target sentence from the tree • Translation as parsing: • Example: (Wu, 1995)

Hybrid approaches • Preprocessing with transfer rules: (Xia and McCord, 2004), (Collins et al, 2005) • Postprocessing with taggers, parsers, etc: JHU 2003 workshop • Hierarchical phrase-based model: (Chiang, 2005) • …

Other topics

Other issues • Resources • MT for Low density languages • Using comparable corpora and wikipedia • Special translation modules • Identifying and translating name entities and abbreviations • …

To build an MT system (1) • Gather resources • Parallel corpora, comparable corpora • Grammars, dictionaries, … • Process data • Document alignment, sentence alignment • Tokenization, parsing, …

To build an MT system (2) • Modeling • Training • Word alignment and extracting clump pairs • Learning transfer rules • Decoding • Identifying entities and translating them with special modules (optional) • Translation as parsing, or parse + transfer + translation • Segmenting src sentence, replace src clump with target clump, …

To build an MT system (3) • Post-processing • System combination • Reranking • Using the system for other applications: • Cross-lingual IR • Computer-assisted translation • ….

Misc • Grades • Assignments ( hw1-hw3): 30% • Class participation: 20% • Project: • Presentation: 25% • Final paper: 25%

Course Summary

Course Summary

Presentation Transcript

Course Summary

Course Summary

Course Summary

Course Summary

Course Summary

Course Summary

Course Summary

Course Summary

Course summary

Course Summary

COURSE SUMMARY

Course Summary

Course Summary

Course Summary

Course Summary

Course Summary

COURSE SUMMARY

Course Summary

Course Summary

Course Summary

Course Summary

Course Summary