1 / 45

Speech and Language Technologies in the Next Generation Localisation CSET

Speech and Language Technologies in the Next Generation Localisation CSET. Prof. Andy Way, School of Computing, DCU. Overview of Presentation. Speech & Language Technologies in the NGL CSET. Overview of Presentation. Speech & Language Technologies in the NGL CSET

yaholo
Download Presentation

Speech and Language Technologies in the Next Generation Localisation CSET

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

  2. Overview of Presentation • Speech & Language Technologies in the NGL CSET

  3. Overview of Presentation • Speech & Language Technologies in the NGL CSET • Facilitating Optimal Multilingual NGL Applications

  4. Overview of Presentation • Speech & Language Technologies in the NGL CSET • Facilitating Optimal Multilingual NGL Applications • Key Research Challenges

  5. Overview of Presentation • Speech & Language Technologies in the NGL CSET • Facilitating Optimal Multilingual NGL Applications • Key Research Challenges • Novel Research Tracks

  6. Overview of Presentation • Speech & Language Technologies in the NGL CSET • Facilitating Optimal Multilingual NGL Applications • Key Research Challenges • Novel Research Tracks • Typical LSP’s Translation Process

  7. Overview of Presentation • Speech & Language Technologies in the NGL CSET • Facilitating Optimal Multilingual NGL Applications • Key Research Challenges • Novel Research Tracks • Typical LSP’s Translation Process • Key Integration Challenges

  8. Overview of Presentation • Speech & Language Technologies in the NGL CSET • Facilitating Optimal Multilingual NGL Applications • Key Research Challenges • Novel Research Tracks • Typical LSP’s Translation Process • Key Integration Challenges • Concluding Remarks

  9. Unified Model DigitalContentManagement NextGenerationLocalisation SystemsFramework Personalised Localisation Enterprise Localisation IntegratedLanguageTechnologies ILT - Integrated Language Technologies Prof. Andy Way ILT Area Coordinator

  10. ILT: Facilitating Optimal Multilingual NGL Applications Text Output Text Processing Machine Translation Text Input e.g. bulk localisation

  11. ILT: Facilitating Optimal Multilingual NGL Applications Speech Output Text Output Text Processing Speech Technologies Machine Translation Speech Input Text Input e.g. personalisation e.g. bulk localisation

  12. Machine Translation: Significance • For our industrial partners, volume of material needing translation increasing, while budgets remain the same • In the EU, now 23 official languages (506 language pairs), and expanding … • In the US, huge investment in translation between Arabic, Chinese and UrduEnglish …

  13. Machine Translation: Significance • For our industrial partners, volume of material needing translation increasing, while budgets remain the same • In the EU, now 23 official languages (506 language pairs), and expanding … • In the US, huge investment in translation between Arabic, Chinese and UrduEnglish …  Automation the only option (especially for PL) …

  14. MT: Key Research Challenges • Enhanced Translation Quality Enhanced Translation Quality Faster Translation Times Scalability Other Modalities (Speech, SMS etc.)

  15. The State-of-the-Art Source: Reference: The two sides highlighted the role of the World Trade Organization (WTO) Baseline: The two sides on the role of the WTO

  16. Improving the State-of-the-Art Source: Reference: The two sides highlighted the role of the World Trade Organization (WTO) Baseline: The two sides on the role of the WTO Our System: The two sides reaffirmed the role of the WTO • Our MT systems have knowledge of syntax • Parts of speech (nouns, verbs etc.) • Roles in sentences (subject, object etc.)  better translation quality

  17. The State-of-the-Art Source: Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel

  18. Improving the State-of-the-Art Source: Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel better translation quality (especially where end-users are concerned) • DCU ArabicEnglish system ranked first at international MT evaluation in Oct. 2007

  19. MT Novel Research: Handling Different Types of Text Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different … So is the form …

  20. MT Novel Research: Handling Different Types of Text Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different … So is the form … •  Build different MT systems for each different task, using our industrial partners’ documentation

  21. Text Processing: Significance and Challenges • If texts are automatically annotated with: • syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM)

  22. Text Processing: Significance and Challenges • If texts are automatically annotated with: • syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM) • text-type and genre information, this helps our MT systems disambiguate text and improve translation quality

  23. Text Processing: Significance and Challenges • If texts are automatically annotated with: • syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM) • text-type and genre information, this helps our MT systems disambiguate text and improve translation quality • localisation information (e.g. <DNT>Andy Way</DNT>), then the workflows of our industrial partners (currently done manually) can be significantly improved (cf. LOC)

  24. Speech Technology:Significance Speech interfaces for eyes-busy, hands-busy scenairos Speech recognition and synthesis systems which can deal with • potentially an unlimited vocabulary • multiple (and non-native) speakers • multiple languages and can be tightly integrated with MT  access  volume & scalability  localisation & personalisation

  25. them ore its nows them ore it goes? themo rei tsn ow sthe mo reitg o es? the more it snows the more it goes… Speech Technology:Challenges themoreitsnowsthemoreitgoes

  26. them ore its nows them ore it goes? themo rei tsn ow sthe mo reitg o es? the more it snows the more it goes… Speech Technology:Challenges themoreitsnowsthemoreitgoes demoreisnowsdemoregoes

  27. them ore its nows them ore it goes? themo rei tsn ow sthe mo reitg o es? the more it snows the more it goes… Speech Technology:Challenges themoreitsnowsthemoreitgoes “rules” and vocabulary of system demoreisnowsdemoregoes performance of (native) speaker linguistic competence of native speaker

  28. them ore its nows them ore it goes? themo rei tsn ow sthe mo reitg o es? “rules” and vocabulary of system the more it snows the more it goes… Speech Technology:Innovations themoreitsnowsthemoreitgoes Robust & Novel Speech Recognition Engine which integrates explicit linguistic knowledge demoreisnowsdemoregoes linguistic competence of native speaker performance of (native) speaker

  29. them ore its nows them ore it goes? themo rei tsn ow sthe mo reitg o es? “rules” and vocabulary of system the more it snows the more it goes… Innovations: Speech Recognition & MT Tight coupling with MT Engines Jemehreschneitdestomehres geht Robust & Novel Speech Recognition Engine which integrates explicit linguistic knowledge themoreitsnowsthemoreitgoes detverkarhavaritenstorstormhurmån linguistic competence of native speaker

  30. Innovations: MT & Speech Synthesis Tight coupling with MT Engines Jemehreschneit destomehres geht Robust & Novel Speech Synthesis Engine which integrates explicit linguistic knowledge themoreitsnowsthemoreitgoes detverkarhavaritenstorstormhurmån

  31. Incoming documents (segmented) Step 3: Documents Validation & Finalization Partially Translated Documents, with confidence rating for segments Freelance Translators Translation Memory DB In-house Translators Step 1: Translation Memory Step 2: Post-editing & translation Typical LSP’s Translation Process Requirement: minimal disruption of this process & Machine Translation TM match score < 50 %: expensive 50 % < TM match score < 70 %: medium TM match score > 70 %: cheap

  32. Key Integration Challenges • Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

  33. Key Integration Challenges • Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] • Linking MT automatic evaluation metrics with post-editing cost

  34. Key Integration Challenges • Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] • Linking MT automatic evaluation metrics with post-editing cost • Ensuring that MT omissions are highlighted

  35. Key Integration Challenges • Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] • Linking MT automatic evaluation metrics with post-editing cost • Ensuring that MT omissions are highlighted • Enforcing customer terminology

  36. Key Integration Challenges • Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] • Linking MT automatic evaluation metrics with post-editing cost • Ensuring that MT omissions are highlighted • Enforcing customer terminology • Deal with markup, tags …

  37. Key Integration Challenges • Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] • Linking MT automatic evaluation metrics with post-editing cost • Ensuring that MT omissions are highlighted • Enforcing customer terminology • Deal with markup, tags … • Produce true-cased translations

  38. Key Integration Challenges • Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] • Linking MT automatic evaluation metrics with post-editing cost • Ensuring that MT omissions are highlighted • Enforcing customer terminology • Deal with markup, tags … • Produce true-cased translations • Integrate into pre-existing workflows!

  39. Concluding Remarks • For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students

  40. Concluding Remarks • For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students • Large interest from industrial partners, both large and small

  41. Concluding Remarks • For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students • Large interest from industrial partners, both large and small • Input from LOC, DCM and SF

  42. Concluding Remarks • For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students • Large interest from industrial partners, both large and small • Input from LOC, DCM and SF • Significant role in CNGL demonstrators

  43. Concluding Remarks • For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students • Large interest from industrial partners, both large and small • Input from LOC, DCM and SF • Significant role in CNGL demonstrators • Research tools  Industrial prototypes

  44. Concluding Remarks • For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students • Large interest from industrial partners, both large and small • Input from LOC, DCM and SF • Significant role in CNGL demonstrators • Research tools  Industrial prototypes • Well placed to succeed in going ‘beyond TMs’ …

  45. Speech & Language Technologies in the NGL CSET Thanks for listening! Questions? http://www.cngl.ie away@computing.dcu.ie

More Related