1 / 9

AMTEXT: Extraction-based MT for Arabic

AMTEXT: Extraction-based MT for Arabic. Alon Lavie, Jaime Carbonell Language Technologies Institute Carnegie Mellon University Email: {alavie,jgc}@cs.cmu.edu Project Members: Laura Kieras, Peter Jansen Informant: Loubna El Abadi. Objective.

nona
Download Presentation

AMTEXT: Extraction-based MT for Arabic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AMTEXT:Extraction-based MT for Arabic Alon Lavie, Jaime Carbonell Language Technologies Institute Carnegie Mellon University Email: {alavie,jgc}@cs.cmu.edu Project Members: Laura Kieras, Peter Jansen Informant: Loubna El Abadi

  2. Objective • Develop a framework for high-accuracy MT of extracted entities, objects and their relationships, which is: • Rapidly portable and adaptable to new source languages • Easily expandable to new types of entities and relationships ITIC MT Integration Meeting

  3. AMTEXT Approach • Develop an elicitation corpus specifically designed for targeted extraction patterns • Learn generalized transfer rules for targeted extraction patterns from elicitation corpus • Acquire high accuracy Named-Entity translation lexicon + limited translation lexicon for targeted vocabulary • Runtime: use partial parser + transfer rules to translate only the matched portions of SL text ITIC MT Integration Meeting

  4. Elicitation Example ITIC MT Integration Meeting

  5. Learning Transfer Rules • Different notion of rule generalization than in our full XFER approach • Generalize from examples to NEs that play specific roles in target extraction pattern • Verbs and function words may not be generalized • Example: Peres will meet with Bush today peres yipagesh &im bush hayom Goal Rule: S::S [NE-P yipagesh &im NE-P TE] -> [NE-P will meet with NE-P TE]((X1::Y1) (X4::Y5) (X5::Y6)) ITIC MT Integration Meeting

  6. Partial Parsing • Input: Full text in the foreign language • Output: Translation of extracted/matched text • Goal: Extract by effectively matching transfer rules with the full text • Identify/parse NEs and words in restricted vocabulary • Identify transfer-rule (source-side) patterns • Handle expected high-levels of ambiguity Peres, meluve b-sar ha-xucshalom, yipagesh im bush hayom NE-P NE-P NE-P TE Peres will meet with Bush today ITIC MT Integration Meeting

  7. Input/Output • Input: • Full text in source language (Arabic) • Output: • English translation of extracted entities and relationships • (Possibly also a structured representation) أعلنت صحيفة القدس العربي ومقرها لندن أنها تلقت الأحد بيانا يتبنى فيه تنظيم القاعدة بزعامة أسامة بن لادن الهجومين اللذين استهدفا كنيسين يهوديين في إسطنبول واللذين أسفرا عن مقتل 23 شخصا وإصابة 300 آخرين. وهدد البيان بتوجيه مزيد من الضربات للولايات المتحدة وحلفائها في جميع أنحاء العالم. The Abu Hafz al-Masri Brigades - al-Qaida warned car bombs killed 23 people injured 300 others AMTEXT System ITIC MT Integration Meeting

  8. Scope of Pilot System • Arabic-to-English • Newswire text (available from TIDES) • Limited set of actions: (X meet Y) (X attend Y) (X hold Y) (X kill Y) (X announce Y)… • Limited translation patterns: • <subj-NE> <verb> <obj> <LOC>* <TE>* • Limited vocabulary ITIC MT Integration Meeting

  9. Evaluation Plan • Compare AMTEXT approach to full-text Arabic-to-English SMT, on a limited task of translation of relations within the scope of coverage • Establish a test set for evaluation • Define an appropriate metric: Precision/Recall/F1 of relations and entities • Compare performance ITIC MT Integration Meeting

More Related