70 likes | 221 Views
Named Entities in Domain Unlimited Speech Translation. Alex Waibel, Stephan Vogel, Tanja Schultz Carnegie Mellon University Interactive Systems Labs. Objective. Extraction and Translation of Arabic Named Entities from Speech Problem: How do we do Domain- Un limited Speech Translation?
E N D
Named Entitiesin Domain Unlimited Speech Translation Alex Waibel, Stephan Vogel, Tanja Schultz Carnegie Mellon UniversityInteractive Systems Labs
Objective • Extraction and Translation of Arabic Named Entities from Speech • Problem: • How do we do Domain-Unlimited Speech Translation? • What to do with Named Entities in Speech? • Named Entities are Typically OoV’s Recognizer will Replace it with a WRONG Word Named Entity is Unlikely to be Handled Right • Translation of Named Entities Named Entities Frequently not in Lexicon ITIC MT Integration Meeting
Approach – Speech Translation • Piggy-Back on STR-DUST (NSF-ITR Project): • Speech Translation on Domain UnlimitedSpeech Tasks • Approach: • Recognition: Statistical Speech Recognition • Consolidation: Statistical Reduction and Extraction • Translation: Statistical MT • Opportunity: • Cascade of Statistical Source-Channel Models • Integration and Optimization • Combine and Compute Joint Models • Working with Errors: Lattices to Communicate between Modules ITIC MT Integration Meeting
Approach – Named Entities • Two Pass Decoding Strategy • OoV’s in Speech: • Recover Named Entity in Dictionary • Identify Relevant Names from Very Large Name Lists • Search for Relevant New Names on Internet • Insert Named Entities in Dictionary, Iterate • New Word Model • Model Unseen Words by New-Word-Model • Assign Named Entity Tag to New-Word • Bi-Lingual Named Entity Tagging • Recover Named Entity • Identify Relevant Names from Translation Output • IR of Relevant Texts in Target Language • Use Transliteration Model to Update Lexicon ITIC MT Integration Meeting
Input/Output • Input: • Speech in source language (Arabic) • Text in source language (Arabic) • Output: • English translation of transcript • English translation of extracted entities Reco القاعدة بزعامة أسامة بن لادن الهجومين اللذين استهدفا كنيسين يهوديين في إسطنبول واللذين أسفرا عن مقتل 23 شخصا وإصابة 300 آخرين. وهدد البيان بتوجيه مزيد من الضربات للولايات المتحدة وحلفائها في جميع أنحاء العالم. NESearch and Translation Name: Abu HafzOrgnz: al-Qaida Location: Baghdad ITIC MT Integration Meeting
Evaluation • Correct Named Entity Detection • Word Correct from Arabic Speech • NE-Tag Correct from Arabic Transcript • Correct Translation • Of Output Text (NIST, Bleu) • Of Output Named Entity ITIC MT Integration Meeting
First Results NE Translation(Chinese) • Online NE translation gives improvements for both tracks • Online NE translation works better on uncommon NE translation, and gives more improvement ITIC MT Integration Meeting