1 / 25

Course Summary

Course Summary. LING 575 Fei Xia 03/06/07. Outline. Introduction to MT: 1 Major approaches SMT: 3 Transfer-based MT: 2 Hybrid systems: 2 Other topics. Introduction to MT. Major challenges. Translation is hard. Getting the right words: Choosing the correct root form

fergus
Download Presentation

Course Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Course Summary LING 575 Fei Xia 03/06/07

  2. Outline • Introduction to MT: 1 • Major approaches • SMT: 3 • Transfer-based MT: 2 • Hybrid systems: 2 • Other topics

  3. Introduction to MT

  4. Major challenges • Translation is hard. • Getting the right words: • Choosing the correct root form • Getting the correct inflected form • Inserting “spontaneous” words • Putting the words in the correct order: • Word order: SVO vs. SOV, … • Unique constructions: • Divergence

  5. Lexical choice • Homonymy/Polysemy: bank, run • Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, … • Coding (Concept  lexeme mapping) differences: • More distinction in one language: e.g., kinship vocabulary. • Different division of conceptual space:

  6. Major approaches • Transfer-based • Interlingua • Example-based (EBMT) • Statistical MT (SMT) • Hybrid approach

  7. The MT triangle Meaning (interlingua) Synthesis Analysis Transfer-based Phrase-based SMT, EBMT Word-based SMT, EBMT word Word

  8. Comparison of resource requirement

  9. Evaluation • Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT. • Human evaluation: accuracy, fluency, … • Problem: expensive, slow, subjective, non-reusable. • Automatic measures: • Edit distance • Word error rate (WER), Position-independent WER (PER) • Simple string accuracy (SSA), Generation string accuracy (GSA) • BLEU

  10. Major approaches

  11. Word-based SMT • IBM Models 1-5 • Main concepts: • Source channel model • Hidden word alignment • EM training

  12. Source channel model for MT P(E) P(F | E) Fr sent Eng sent Noisy channel • Two types of parameters: • Language model: P(E) • Translation model: P(F | E)

  13. Modeling p(F | E) with alignment

  14. Modeling Model 1: Model 2: • Parameters: • Length prob: P(m | l) • Translation prob: t(fj | ei) • Distortion prob (for Model 2): d(i | j, m, l)

  15. Training • Model 1:

  16. Finding the best alignment Given E and F, we are looking for Model 1:

  17. Clump-based SMT • The unit of translation is a clump. • Training stage: • Word alignment • Extracting clump pairs • Decoding stage: • Try all segmentations of the src sent and all the allowed permutations • For each src clump, try TopN tgt clumps • Prune the hypotheses

  18. Transfer-based MT • Analysis, transfer, generation: • Example: (Quirk et al., 2005) • Parse the source sentence • Transform the parse tree with transfer rules • Translate source words • Get the target sentence from the tree • Translation as parsing: • Example: (Wu, 1995)

  19. Hybrid approaches • Preprocessing with transfer rules: (Xia and McCord, 2004), (Collins et al, 2005) • Postprocessing with taggers, parsers, etc: JHU 2003 workshop • Hierarchical phrase-based model: (Chiang, 2005) • …

  20. Other topics

  21. Other issues • Resources • MT for Low density languages • Using comparable corpora and wikipedia • Special translation modules • Identifying and translating name entities and abbreviations • …

  22. To build an MT system (1) • Gather resources • Parallel corpora, comparable corpora • Grammars, dictionaries, … • Process data • Document alignment, sentence alignment • Tokenization, parsing, …

  23. To build an MT system (2) • Modeling • Training • Word alignment and extracting clump pairs • Learning transfer rules • Decoding • Identifying entities and translating them with special modules (optional) • Translation as parsing, or parse + transfer + translation • Segmenting src sentence, replace src clump with target clump, …

  24. To build an MT system (3) • Post-processing • System combination • Reranking • Using the system for other applications: • Cross-lingual IR • Computer-assisted translation • ….

  25. Misc • Grades • Assignments ( hw1-hw3): 30% • Class participation: 20% • Project: • Presentation: 25% • Final paper: 25%

More Related