260 likes | 435 Views
Speech Translation on a PDA. By: Santan Challa Instructor Dr. Christel Kemke. Introduction. Based on an article “PDA translates Speech” by Kimberley Patch[1]. Combined effort of researchers from CMU, Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc. What is the Aim?
E N D
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke
Introduction • Based on an article “PDA translates Speech” by Kimberley Patch[1]. • Combined effort of researchers from CMU, Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc. • What is the Aim? • Two-way translation of medical information from English to Arabic and Arabic to English. • System Used: iPaq handheld computer
System • iPaq handheld computer • 64 MB memory • Requirements • Two recognizers • Translators • Synthesizers
Different Phases • Automatic Speech Recognition (ASR) • Speech Translation • Speech Synthesis
Automatic Speech Recognition • ASR-Technology that recognizes and executes voice commands • Steps in ASR • Feature Extraction • Acoustic modeling • Language modeling • Pattern Classification • Utterance verification • Decision
Speech Recognition Process[2] Feature Extraction Functions of a speech recognizer Pattern Classification Language Modeling Acoustic Modeling Utterance Verification Decision
Feature Extraction • Features:- Attributes pertaining to a person that enable a speech recognizer to distinguish the phonemes in each word[3]. Energy:
Visual Display of Frequencies • Spectrogram. The energy levels are decoded to extract the features, which are stored in an feature vector for further processing[3].
Feature Extraction • Speech Signal ->Microphone->Analog signal. • Digitization of analog signal to store in the computer. • Digitization involves sampling (Common sampling rates…8000hz to 16,000hz). • Features are extracted from the digitized speech. • Results in feature vector (numerical measurements of speech attributes [3]) • Speech recognizer uses the feature vectors to decode the digitized speech signal.
Acoustic Modeling • Numerical representation of sound (utterances of words in a language). • Comparison of speech features of digitized speech signal with the features of existing models. • Determination of sound is probabilistic by nature. • Hidden Markov Model (HMM) is a statistical technique which forms basis for the development of acoustic models. • HMMs give the statististical likelihood of particular sequence of words or phonemes[3] • HMMs are used in both speech training and speech recognition
HMMs Cont’d • Depend on the Markov Chain. (a sequence of random variables whose next values depend on the previous values[3] as represented below).
Other Speech Recognition Components • Pattern Classifier: The Pattern classification component groups the patterns generated by the acoustic modeling component. Speech patterns having similar speech features are grouped together. • The correctness of the words generated by the pattern classifier is measured by the utterance verification component. • What the Speechalator Prototype[4] uses… • The prototype uses a HMM based recognizer, designed and developed by Multi-Modal Technologies Inc. • The speech recognizer needs 1 MB of memory and the acoustic models occupy 3MB of memory.
Speech Translation • What is Machine Translation (MT)? • Translation of Speech from one language to another with the help of software. • Types of MT: • Direct Translation (Word–to-word) • Transfer Based Translation • Interlingua Translation
Why MT is difficult • Ambiguity: Sentence and words have different meanings. • Lexical Ambiguity, • Structural Ambiguity, • Semantically Ambiguous. • Structural Differences between Language • Idioms cannot be translated
Approaches in Machine Translation Machine Translation Triangle or Vauqois Triangle IL Analysis Synthesis Transfer Target Language Direct Translation Source Language
Differences between the three translation architectures: • Direct translation: Word-to-word translation • Transfer based: Requires the knowledge of both source and target language. • Suits for Bilingual Translation • Intermediate representations are language dependent • Parses the source language sentence, and applies transfer rules that map grammatical segments of the source and target language.
Differences between the three translation architectures cont’d.. • Interlingual Transaltion. • Generates a language independent representation called Interlingua (IL) for the meaning of sentences or segments of sentences in the source language. • A text in source language can be converted into any target language. Hence suits for multilingual translation.
More on Machine Translation • Knowledge Based MT (KBMT): • Completely analyze and understand the meaning of the source text [5]. • Translate into target language text. • Performance heavily relies on the amount of world knowledge present to analyze the source language. • Knowledge represented in the form of frames. [Event: Murder is a: Crime]
Machine Translation Cont’d • Example Based MT (EBMT): • Sentence are analyzed on the basis of similar example sentences analyzed previously. What Speechalator Prototype Uses? • Statistical based MT (SBMT) [5]: • Uses Corpora that is analyzed previously. • No linguistic information required. • N-gram modeling used
Speech Synthesis • Generation of human voice from a given text or phonetic description [6]. Text To Speech (TTS) systems.
Conclusions • Speechalator is an good achievement in both mobile technology and NLP. • Simple push-to-talk button interface. • Uses optimized Speech recognizers and speech synthesizers. • This architecture allows components to be placed both on-device and on a server. • Presently most of the components are ported to the device. • Performance: • 80% accuracy • Takes 2-3 seconds for translation • Presently restricted to a domain…
Future Work • Increase accuracy of the device to deal with noisy environments. • Build more learning algorithms. • Multi-lingual speech recognizer. • To achieve Domain independence.
References • Kimberley Patch. PDA Translates Speech. Technology and Research News (TRN), 17/24 December, 2003. • Richard V. Cox, Lawrence R. Rabiner, Candace A. Kamm. Speech and Language Processing for next-millennium communication services. Proceedings of the IEEE, 88(8):1314-1337, Feb 2000. • http://www.isip.msstate.edu/projects/speech/ ASR Home page. • Speechalator: Two-Way Speech-To-Speech Translation on a Consumer PDA, Eurospeech 2003 Geneva, Switzerland Pages:1-4. • Machine Translation: A survey of approaches. Joseph Seaseley. University of Michigan Ann Arbor. • Thierry Dutoit . A short introduction to Text-to-Speech Synthesis (TTS). http://tcts.fpms.ac.be/synthesis/introtts.html