160 likes | 476 Views
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc npatel@bhrigus.com. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco. ABSTRACT.
E N D
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc npatel@bhrigus.com Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
ABSTRACT • This paper describes our work in developing multilingual speech recognition and speech synthesis systems in Indian Languages. • Existing speech technologies are TTS and ASR in US-Eng, Ind –Eng, Hindi no such systems exist for any other Indian languages. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Introduction • Voice enabled services are rapidly growing and high margin opportunity, specifically in multilingual country such as India . • It is very difficult to have one speech synthesizer for each language. • The focus is also to develop common multilingual corporawith support for multiple Indian languages and to build appropriate language specific linguistic analysis modules for text-to-speech synthesis. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Important issues involved • Enumerating a phone set to represent Indian languages. • Selection of basic unit for synthesis - half-phones, diphones, syllables. • Creating a generic acoustic database that covers language variations. • Modeling language specific prosody. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Our approaches • A common notation for graphemes is developed using IT-3 transliteration. • Di phone based speech synthesis. • Data-driven prosody modeling using Classification and Regression Trees (CART). • Concatenative synthesis using cluster unit selection techniques with syllable-like units. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Our Current research work • Text to speech synthesis • TTS is a multi lingual Text–To–Speech Engine which would enable speech applications to be built in local Indian languages using unit selection algorithm and large corpus. • A Telugu TTS system has been built and a voice portal which reads out the local language news in Telugu has been developed. • Speech recognition • ASR is a multi lingual automatic speech recognition System that in conjunction with our TTS will enable full fledged speech solutions, the advance features of this engine would allow customization to a vertical within a few hours. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Search engine • This is a cross-lingual search engine capable of searching through the content of all Indian languages. • This advanced cross-lingual search engine makes use of several novel features of Indian language scripts including phonetic nature, common phonetic base and syllabic structure of Indian languages. • The other novelty of this search engine is that it uses phonetic level units for indexing which enable seamless cross-lingual search across the languages. • Phonetic typing tool • This tool make use of an intuitive and advanced readable transliteration scheme and phonetic properties to key-in scripts in Indian languages. • The Bhrigus phonetic typing tool comes with a friendly user interface as well as with APIs to get integrated in applications such as Email, Blogging framework etc. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Font converters • There is a chaos as far as the Indian languages in electronic form are concerned. • Neither can one exchange the notes in Indian languages as conveniently as in English language, nor can one perform search on texts in Indian languages available over the web. • This is so because the texts are being stored in font dependent glyph codes. • The glyph coding schemes for these fonts is typically different for different fonts. • To view the content of these sites then one requires these fonts on local machine. • We are building the font converters for almost all Indian languages. • Multi lingual dictionary We are developing a multi lingual dictionary which consists of English as source language and the target languages are Indian languages such as Telugu, Tamil, Gujarathi, Hindi etc. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Bhrigus ASR and TTS Process Framework • The project components of a TTS system could be divided into language-independent component (LIC) and language-dependant component (LDC). • LIC consists of speech synthesis engine dealing with unit selection algorithm and signal processing. • LDC deals with building language specific resources such as pronunciation dictionary, unit selection database to build a synthetic voice. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
LDC LIC (Bhrigus TTS) • Linguistic resources • Text data collection • Text normalization • Pronunciation dictionary • Letter to sound rules • Syllabification, Stress • Prosodic Pause pred. Unit-selection Synthesis engine • Speech resources • 1. Unit-selection database • 2. Prosodic modeling Language Dependant (LDC) and Language independent (LIC) components of a TTS system Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
LDC LIC (Bhrigus ASR) • Linguistic resources • 1. Text data collection • 2. Pronunciation dictionary • 3. Letter to sound rules • 4. Language Model Speech Recognition Engine • Speech resources • 1. Acoustic Models Language Dependant (LDC) and Language Independent components (LIC) of an ASR system Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
The development time for building a TTS and an ASR system should consists of developing LIC components and LDC components. • The LIC component of ASR systems is Bhrigus ASR speech recognition-engine, while the LIC component of TTS system is Bhrigus TTS unit-selection-engine. • To build LDC components for ASR and TTS, it is suggested to build them together as it would decrease the development time primarily due to sharing of language dependent resources across TTS and ASR systems. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
The LDC resources that could be shared across TTS and ASR systems are text data, pronunciation dictionary and letter-to-sound rules. • The collected text would be used to build language models for ASR and at the same time would be used to extract a set of optimal sentences to be recorded in the case of TTS system. • Similarly pronunciation dictionary and letter-to-sound rules could be shared across the TTS and ASR system. • It should also be noted that there exists several modules inside the TTS and ASR engines which could be shared too. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Demos • Demos are at • http://196.12.38.23/index.html Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Conclusion • Four basic principles are to create and sustain the leading market solution for professional services. • text-to-speech, • speech-to-text, • search, machine translation • natural dialogue management for Indian languages including Indian-English; interface that solution into the vast majority of technical environments relevant to these types of applications; provide skilled services; and provide services at differentiated low rates Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco
Thank you Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco