120 likes | 258 Views
11-734 Advanced MT Seminar Spring 2008. Instructors: Alon Lavie and Stephan Vogel. Course Objectives. Objective: Study and review in depth a selection of important research topics in current state-of-the-art MT Main Focus: Data-driven search-based MT approaches
E N D
11-734Advanced MT SeminarSpring 2008 Instructors: Alon Lavie and Stephan Vogel
Course Objectives • Objective: Study and review in depth a selection of important research topics in current state-of-the-art MT • Main Focus: Data-driven search-based MT approaches • MT resources are primarily acquired automatically from large volumes of monolingual and bilingual corpora • Translation process is framed as a computational search optimization problem, driven by various statistical “models” and ML-based features • Other important or related topics may also be explored
Course Format • Course Format: Graduate-level Seminar • Stephan and Alon will present a few introductory lectures • Students will present and lead the remaining lectures and discussions • Individual Student Tasks: • Select and define a specific research topic • Identify 1-2 basic research papers (for everyone to read) • Conduct a broad literature review of the topic • Prepare a class presentation on the topic and lead class discussion • Write a 10-15 page literature review “white paper” on current state of the research topic and on its future directions
Course Format • Requirements and Expectations: • 1-2 basic readings for each topic should be announced at least one week in advance • Everyone is expected to attend all class meetings, read the 1-2 basic papers before class, prepare questions • Student Presentations: present an overview of the topic in class (not just the basic papers) and lead the discussion about important issues, open research questions, future directions, etc. • White Papers will be due towards the end of the semester • Grading: • 40% Presentation • 40% White Paper • 20% Class Participation
Preliminary List of Topics • Models and Approaches for Word, Phrase and Structure Alignment: • Hierarchical Alignment Models: ITG-style, Hiero-style, Syntax-based models: tree-to-string, string-to-tree, tree-to-tree • Discriminative Alignment Models • Constrained Alignment Models • Methods for Phrase Extraction from word-aligned parallel data • Methods for Rule Extraction from parsed and word aligned parallel data • Word Reordering Models: • Word and phrase-based, POS-based, syntax-based
Preliminary List of Topics • Search-based Decoding: • Basic decoding algorithms, computational complexity and efficiency issues • Decoders for various “flavors” of data-driven search-based MT • Optimization issues, monotonicy, pruning, hypothesis re-combination • Language Modeling for MT: • Very large scale statistical LMs: technical challenges and solutions • Domain and Genre adaptation • Syntactic LMs • Discriminative LMs, Factored LMs, “unconventional” approaches • Architecture and Design of Large-scale MT systems: • Training methods and tools • MERT and parameter tuning • Runtime architectures, online vs. offline systems
Preliminary List of Topics • Morphology and Word Segmentation and their integration within MT: • Morphological analysis and generation tools • Integrating morphological processing within MT • Input segmentation issues, ambiguity and confusion networks • Multi-Engine MT and System Combination Approaches • MT Evaluation: • Automatic metrics for MT evaluation; methods for assessing MT eval metrics, strengths and weaknesses • Human evaluation, Subjective and Objective metrics, Confidence scores • Evaluation campaigns and how they are conducted • Online Translation Services and how they work: • Google, Babelfish, MS Word tools, instant messaging
Tentative Schedule • Jan 16: Organization + Stephan: Basic Word Alignment Models • Jan 23: Stephan: Word Alignment Models, Phrase Extraction methods • Jan 30: Stephan and/or Alon: TBD (Decoding basics? MT Evaluation?) • Feb 6: Student #1 • Feb 13: Student #2 • Feb 20: NO CLASS (Stephan and Alon away) • Feb 27: Student #3 • Mar 5: Student #4 • Mar 12: NO CLASS (Spring Break) • Mar 19: Student #5 • Mar 26: Student #6 • Apr 2: Student #7 • Apr 9: NO CLASS (GALE PI Meeting) • Apr 16: Student #8 • Apr 23: Student #9 • Apr 30: Student #10
Task #1 • By next week’s class meeting (Wed 1/23): • Select a research topic • Write a one-page description that outlines and scopes your selected research topic, and lists 1-2 basic readings on the topic • Email Alon and Stephan your one-page description, plus three preferred presentation dates • Act Fast! We will coordinate topic selections and presentation date preferences primarily by logical order and by receipt time
Students and Topics • Abhaya Agarwal: Discriminative Methods for Training Translation Models • Aaron Phillips: Methods for Context Incorporation in MT 3/05 2/27 3/19 • Jason Adams: WSD and its Integration within MT 3/26 2/27 2/20 • Alok Parlikar: Phrase-based SMT and Solutions to ‘Out of Order’ Problem 3/19 3/05 2/27 • Amr Ahmed: Syntax-based Machine Translation Models 3/26 4/02 4/16 • Eric Davis: Morphology and Segmentation Issues in MT 2/06 2/13 2/27 • Greg Hanneman: Towards Syntactically-Constrained Statistical Word Alignment 4/16 4/23 4/30 • Linh Nguyen: Morphology and Word Segmentation and their integration within MT 3/05 or later • Qin Gao: Large Scale Architecture for MT Systems 3/05 2/27 4/02 • Vamshi Ambati: Dependency Structures in Syntax oriented Machine Translation 3/19 3/26 4/02 • Rashmi Gangadharaiah:Factored and Syntactic Language models 4/02 3/19 3/05
Proposed Schedule • Jan 16: Organization + Stephan: Basic Word Alignment Models • Jan 23: Stephan: Word Alignment Models, Phrase Extraction methods • Jan 30: Stephan: Decoding basics • Feb 6: Student #1: Eric Davis – Morphology and/or Segmentation • Feb 13: Student #2: Linh Nguyen – Morphology and/or Segmentation • Feb 20: NO CLASS (Stephan and Alon away) • Feb 27: Student #3: Jason Adams: WSD in MT • Mar 5: Student #4: Aaron Phillips – Incorporating Context in MT • Mar 12: NO CLASS (Spring Break) • Mar 19: Student #5: Alok Parlikar – Reordering in Phrase-based SMT • Mar 26: Student #6: Amr Ahmed – Syntax-based Models and their training • Apr 2: Student #7: Vamshi Ambati – Dependency Structures in MT • Apr 9: NO CLASS (GALE PI Meeting) • Apr 16: Student #8: Rashmi – Factored and Syntax-based LMs • Apr 23: Student #9: Greg Hanneman – Syntactically-constrained WA • Apr 30: Student #10: Qin Gao – Large-scale MT Architectures • May 7: Student #11: Abhaya Agarwal - Discriminative Training Methods
MT Lunch Slots • Currently held reservations (all on Tuesdays): • Feb 19 (Alon and Stephan away) • Mar 18 (12:30-2:00) • Apr 22 (12:30-2:00) • May 20 • Jun 17