1 / 26

Speech Translation on a PDA

Speech Translation on a PDA. By: Santan Challa Instructor Dr. Christel Kemke. Introduction. Based on an article “PDA translates Speech” by Kimberley Patch[1]. Combined effort of researchers from CMU, Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc. What is the Aim?

hada
Download Presentation

Speech Translation on a PDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke

  2. Introduction • Based on an article “PDA translates Speech” by Kimberley Patch[1]. • Combined effort of researchers from CMU, Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc. • What is the Aim? • Two-way translation of medical information from English to Arabic and Arabic to English. • System Used: iPaq handheld computer

  3. System • iPaq handheld computer • 64 MB memory • Requirements • Two recognizers • Translators • Synthesizers

  4. Different Phases • Automatic Speech Recognition (ASR) • Speech Translation • Speech Synthesis

  5. Automatic Speech Recognition • ASR-Technology that recognizes and executes voice commands • Steps in ASR • Feature Extraction • Acoustic modeling • Language modeling • Pattern Classification • Utterance verification • Decision

  6. Speech Recognition Process[2] Feature Extraction Functions of a speech recognizer Pattern Classification Language Modeling Acoustic Modeling Utterance Verification Decision

  7. Feature Extraction • Features:- Attributes pertaining to a person that enable a speech recognizer to distinguish the phonemes in each word[3]. Energy:

  8. Visual Display of Frequencies • Spectrogram. The energy levels are decoded to extract the features, which are stored in an feature vector for further processing[3].

  9. Feature Extraction • Speech Signal ->Microphone->Analog signal. • Digitization of analog signal to store in the computer. • Digitization involves sampling (Common sampling rates…8000hz to 16,000hz). • Features are extracted from the digitized speech. • Results in feature vector (numerical measurements of speech attributes [3]) • Speech recognizer uses the feature vectors to decode the digitized speech signal.

  10. Acoustic Modeling • Numerical representation of sound (utterances of words in a language). • Comparison of speech features of digitized speech signal with the features of existing models. • Determination of sound is probabilistic by nature. • Hidden Markov Model (HMM) is a statistical technique which forms basis for the development of acoustic models. • HMMs give the statististical likelihood of particular sequence of words or phonemes[3] • HMMs are used in both speech training and speech recognition

  11. HMMs Cont’d • Depend on the Markov Chain. (a sequence of random variables whose next values depend on the previous values[3] as represented below).

  12. Other Speech Recognition Components • Pattern Classifier: The Pattern classification component groups the patterns generated by the acoustic modeling component. Speech patterns having similar speech features are grouped together. • The correctness of the words generated by the pattern classifier is measured by the utterance verification component. • What the Speechalator Prototype[4] uses… • The prototype uses a HMM based recognizer, designed and developed by Multi-Modal Technologies Inc. • The speech recognizer needs 1 MB of memory and the acoustic models occupy 3MB of memory.

  13. Speech Translation

  14. Speech Translation • What is Machine Translation (MT)? • Translation of Speech from one language to another with the help of software. • Types of MT: • Direct Translation (Word–to-word) • Transfer Based Translation • Interlingua Translation

  15. Why MT is difficult • Ambiguity: Sentence and words have different meanings. • Lexical Ambiguity, • Structural Ambiguity, • Semantically Ambiguous. • Structural Differences between Language • Idioms cannot be translated

  16. Approaches in Machine Translation Machine Translation Triangle or Vauqois Triangle IL Analysis Synthesis Transfer Target Language Direct Translation Source Language

  17. Differences between the three translation architectures: • Direct translation: Word-to-word translation • Transfer based: Requires the knowledge of both source and target language. • Suits for Bilingual Translation • Intermediate representations are language dependent • Parses the source language sentence, and applies transfer rules that map grammatical segments of the source and target language.

  18. Differences between the three translation architectures cont’d.. • Interlingual Transaltion. • Generates a language independent representation called Interlingua (IL) for the meaning of sentences or segments of sentences in the source language. • A text in source language can be converted into any target language. Hence suits for multilingual translation.

  19. More on Machine Translation • Knowledge Based MT (KBMT): • Completely analyze and understand the meaning of the source text [5]. • Translate into target language text. • Performance heavily relies on the amount of world knowledge present to analyze the source language. • Knowledge represented in the form of frames. [Event: Murder is a: Crime]

  20. Machine Translation Cont’d • Example Based MT (EBMT): • Sentence are analyzed on the basis of similar example sentences analyzed previously. What Speechalator Prototype Uses? • Statistical based MT (SBMT) [5]: • Uses Corpora that is analyzed previously. • No linguistic information required. • N-gram modeling used

  21. Speech Synthesis

  22. Speech Synthesis • Generation of human voice from a given text or phonetic description [6]. Text To Speech (TTS) systems.

  23. Snapshot of Spechalator

  24. Conclusions • Speechalator is an good achievement in both mobile technology and NLP. • Simple push-to-talk button interface. • Uses optimized Speech recognizers and speech synthesizers. • This architecture allows components to be placed both on-device and on a server. • Presently most of the components are ported to the device. • Performance: • 80% accuracy • Takes 2-3 seconds for translation • Presently restricted to a domain…

  25. Future Work • Increase accuracy of the device to deal with noisy environments. • Build more learning algorithms. • Multi-lingual speech recognizer. • To achieve Domain independence.

  26. References • Kimberley Patch. PDA Translates Speech. Technology and Research News (TRN), 17/24 December, 2003. • Richard V. Cox, Lawrence R. Rabiner, Candace A. Kamm. Speech and Language Processing for next-millennium communication services. Proceedings of the IEEE, 88(8):1314-1337, Feb 2000. • http://www.isip.msstate.edu/projects/speech/ ASR Home page. • Speechalator: Two-Way Speech-To-Speech Translation on a Consumer PDA, Eurospeech 2003 Geneva, Switzerland Pages:1-4. • Machine Translation: A survey of approaches. Joseph Seaseley. University of Michigan Ann Arbor. • Thierry Dutoit . A short introduction to Text-to-Speech Synthesis (TTS). http://tcts.fpms.ac.be/synthesis/introtts.html

More Related