1 / 42

Survey of Speech-to-speech Translation Systems: Who are the players

Survey of Speech-to-speech Translation Systems: Who are the players. Joy (Ying Zhang) Language Technologies Institute Carnegie Mellon University. Major Players. And many others …. Major Speech Translations Systems. Who is doing what in the co-op projects?. AT&T “How May I Help You”.

Download Presentation

Survey of Speech-to-speech Translation Systems: Who are the players

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survey of Speech-to-speech Translation Systems: Who are the players Joy (Ying Zhang) Language Technologies Institute Carnegie Mellon University

  2. Major Players And many others ….

  3. Major Speech Translations Systems

  4. Who is doing what in the co-op projects?

  5. AT&T “How May I Help You” • Spanish-to-English • MT: transnizer • A transnizer is a stochastic finite-state transducer that integrates the language model of a speech recognizer and the translation model into one single finite-state transducer • Directly maps source language phones into target language word sequences • One step instead of two • Demo

  6. MIT Lincoln Lab • Two way Korean/English speech translation • Translation system: interlingua (Common Coalition Language)

  7. MIT Lincoln Lab

  8. NEC Stand-alone version [ISOTANI03] C/S version as in [Yamabana ACL03]

  9. NEC • Special issues in ASR: • To reduce memory requirment • Gaussian reduction based on MDL [Shinoda, ICASSP2002] • Global tying of the diagonal covariance matrices of Gaussian mixtures • To reduce calculation time • Construct a hierarchical tree of gaussians • Leaf node correspond to gaussians in the HMM states • Parent node gaussians cover gaussians of the child nodes • Prob calculation of an input feature vector does not always need to reach the leaf • 10 times faster with minimum loss of accuracy

  10. NEC • Translation module

  11. NEC • Lexicalized Tree AutoMatabased Grammars

  12. NEC • Translation procedure • Morphological analysis to build initial word lattice • Load feature structure and the tree automata • The parser performs left-to-right bottom-up chart parsing (breadth-first) • Chose the best path • Top-down generate • Pack trees for compact translation engine • 8MB for loading the translation model • 1~4MB working memory

  13. NEC Translation Example [Watanabe, ICSLP00]

  14. NEC • Implementation issues • 27MB to load the system • 1~4MB working memory • OS (PocketPC) limites mem to 32MB • Runs on PDAs with StrongARM 206 MHz CPU • Delay of several seconds in ASR • Accuracy • ASR: 95% for Japanese, 87% for English • Translation • J->E: 66% Good, 88% Good+OK • E->J: 74% Good, 90% Good+OK

  15. PhraseLator • Demo

  16. Phraselator • Major challenges are not from ASR • Tough environment • Power needs to last for hours • Batteries can be charged from 12VDC, 24VDC; 110/220VAC • Critical human engineering criteria • Audio system allows full range freq. Response from mic through CODEC and back out to the speaker

  17. PF-STAR • Preparing Future Multisensorial Interaction Research • Crucial areas: • Speech-to-speech translation • Detection and expressions of emotional states • Core speech technologies for children • Participant: ITC-irst, RWTH, UERLN, KTH, UB, CNR ISTC-SPFD

  18. TC-STAR_P • To prepare a future integrated project named "Technology and Corpora for Speech to Speech Translation" (TC-STAR) • Objectives: • Elaborating roadmaps on SST • Strengthening the R&D community • Industrial; Academics; Infrastructure entities • Buildup the future TC-STAR management structure • Participants: • ELDA, IBM, ITC-irst, KUN, LIMSI-CNRS, Nokia, NSC, RWTH, Siemens, Sony, TNO, UKA, UPC

  19. LC-STAR • Launched: Feb 2002 • Focus: creating language resources for speech translation components • Flexible vocabulary speech recognition • High quality text-to-speech synthesis • Speech centered translation • Objective: • To make large lexica available for many languages that cover a wide range of domains along with the development of standards relating to content and quality

  20. LC-STAR • Drawbacks of existing LR • Lack of coverage for application domains • Lack of suitability for synthesis and recognition • Lack of quality control • Lack of standards • Lack of coverage in languages • Mostly limited to research purposes (lc-star, eurospeech 93)

  21. LC-STAR • For speech-to-speech translation • Focus: statistical approaches using suitable LR • “Suitable” LR • Aligned bilingual text corpora • Monolingual lexica with morpho-syntactic information

  22. LC-STAR • List of languages and responsible site • Other partners: SPEX(Speech Processing Expertise) and CST(Center for Sprogteknologi)

  23. LC-STAR • Progress and Schedule • Design of Specifications • Corpora collections • Phase I: build large lexica for ASR and TTS • Phase II: • Can MT benefit from linguistic features in bilingual lexica (RWTH) • Define specification for bilingual lexica • Create special speech-to-speech translation lexica

  24. EuTrans • Sponsor: European Commission program ESPRIT • Participants: • University of Aachen(RWTH), Germany • Research center of the Foundazione Ugo Bordoni, Italy • ZERES GmbH, German company • The Universitat Politecnica of Valencia, Spain • Project stages: • First stage (1996, six month): to demonstrate the viability • Second stage (1997-2000, three years): developed methodologies to address everyday tasks

  25. EuTrans • Features • Acoustic model is part of the translation model (tight integration) • Generate acoustic, lexical and translation knowledge from examples (example-based) • Limited domain • Later work used categories (come word classes) to reduce the corpus size

  26. EuTrans • ATROS (Automatically Traninabl Recognition of Speech) is a continuous-speech recognition/translation system • based on stochastic finite state acoustic/lexical/syntactic/translation models

  27. EuTrans • FST • A set of algorithms to learn the transducers • Make_TST (tree subsequential transducer); Make_OTST (onward TST); Push_back; Merge_states; OSTIA (OST Inference Alg.); OSTIA-DR

  28. DARPA Babylon • Objective: two-way, multilingual speech translation interfaces for combat and other field environment • Performance goals: • 1-1.5x real time • ASR accuracy 90% • MT accuracy 90% • Task computation 80-85% • Qualitative goals: • User satisfaction/acceptance • Ergonomic compliance to the uniform ensemeble • Error recovery procedures • User tools for field modification and repair • Scalability • Hardware: to PDA and workstations • Software: non-language expert can configure a new language or add to an existing language

  29. Speechlator (Babylon) • Part of the Babylon project • Specific aspects: • Working with Arabic • Using interlingua approach to translation • Pure knowledge-based approach, or • Statistical approach to translate IF to text in target language • Host entire two-way system on a portable PDA-class device Waible [NAACL03]

  30. ATR • Spoken Language Translation Research Lab • Department1: robust multi-lingual ASR; • Department2: integrating ASR and NLP to make SST usable in real situations • Department3: corpus-based spoken language translation technology, constructing large-scale bilingual database • Department4: J-E translation for monologue, e.g. simultaneous interpretation in international conference • Department5: TTS

  31. ATR MATRIX • MATRIX: Multilingual Automatic Translation System [Takezawa98] • Cooperative integrated language translation method

  32. ATR MATRIX • ASR • real-time speech recognition using speaker-independent phoneme-context-dependent acoustic model and variable-order N-gram language model • Robust translation • Using sentence structure • Using examples* • Partial translation • Personalized TTS: CHATR * [Hitoshi96]

  33. IBM MASTOR • Statistical parser • Interlingua-like semantic and syntactic feature representation • Sentence-level NLG based on Maximum Entropy, including: • Previous symbols • Local sentence type in the semantic tree • Concept list remains to be generated [Liu, IBM Tech Report RC22874 ]

  34. Janus I • Acoustic modeling - LVQ • MT: a new module that can run several alternate processing strategies in parallel • LR-parser based syntactic approach • Semantic pattern based approach (as backup) • Neural network, a connectionist approach (as backup): PARSEC • Speech Synthesizer: DECtalk Woszczyna [HLT93]

  35. Janus II/III • Acoustic model • 3-state Triphones modeled via continuous density HMMs • MT: Robust GLR + Phoenix translation (as backup); GenKit for generation • MT uses the N-best list from ASR (resulted in 3% improvement) • Cleaning the lattice by mapping all non-human noises and pauses into a generic pause • Breaking the lattice into a set of sub-lattices at points where the speech signal contains long pauses • Prune the lattice to a size the the parser can process Lavie [ICSLP96]

  36. DIPLOMAT / Tongues • Toshiba Libretto: 200MHz, 192MB RAM • Andrea handset, custom touchscreen, new GUI • Speech recognizer: Sphinx II (open source) • Semi-continuous HMMs, real-time • Speech synthesizer: Festival (open source) • Unit selection, FestVox tools • MT: CMU’s EBMT/MEMT system • Collected data via chaplains role-playing in • English; translated and read by Croatians • Not enough data, Croatian too heavily female [Robert Frederking]

  37. Nespole! • Negotiating through SPOken language in E-commerce • Funded by EU and NSF • Participant: ISL, ITC-irst • Demo

  38. Nespole! • Translation via interlingua • • Translation servers for each language exchange interlingua (IF) to perform translation • Speech recognition: (Speech -> Text) • Analysis: (Text -> IF) • Generation: (IF-> Text) • Synthesis: (Text -> Speech) [Lavie02]

  39. Verbmobil • Funded by German Federal Ministry of Education and Research (1993-2000) with 116 million DM • Demo ; See Bing’s talk for more details

  40. Digital Olympics • Multi-Linguistic Intellectual Information Service • Plan: • Plan I: voice-driven phrasebook translation (low risk). Similar to phraselator • Plan II: robust speech translation within very narrow domains. Similar to Nespole! (medium risk) • Plan III: Highly interactive speech translation with broad linguistic and topic coverage (Olympic 2080?) [Zong03]

  41. Conclusions • Major sponsor: government (DARPA,EU) • ASR: mainly HMM • MT: • Interlingua (Janus, Babylon) • FST (AT&T, UPV) • EBMT(ATR, CMU)/SMT(RWTH,CMU) • Coupling: between ASR and MT • See “Coupling of Speech Recognition and Machine Translation in S2SMT” by Szu-Chen (Stan) Jou for more discussions

  42. Reference and Fact-sheet • http://projectile.is.cs.cmu.edu/research/public/talks/speechTranslation/facts.htm

More Related