1 / 35

December 19, 2005

FPMS. December 19, 2005. Acapela’s corporate profile. Group Background. Babel Technologies > Created in 1995 in Mons (Belgium) > Spin off of Mons Polytechnical University > In-house TTS & ASR technologies > TTS and ASR leader in Embedded environment. Infovox

kimn
Download Presentation

December 19, 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FPMS December 19, 2005

  2. Acapela’s corporate profile

  3. Group Background • Babel Technologies • > Created in 1995 in Mons (Belgium) • > Spin off of Mons Polytechnical University • > In-house TTS & ASR technologies • > TTS and ASR leader in Embedded environment • Infovox • > Created in 1983 in Stockholm (Sweden) • > Spin off of KTH (Royal Institute of Technology) • > Integrated into Telia Promotor in 1993 • > Acquired by Babel Technologies in 2001 • > TTS leader in Nordic, Germany and Netherlands > Accessibility and Telecom expertise Elan Speech • > Created in 1980 in Toulouse (France) • > Focused on TTS since 1996 • > Launch of in-house high quality TTS in 2002 (Elan Sayso) • > TTS leader in Telecom and Automotive

  4. Acapela’s locations Sweden, Stockholm 3 sites 50 people International Team Local support in each site Merged organization Belgium, Mons France, Toulouse

  5. Acapela’s multilingual offer ASR & TTS components in 23 languages

  6. Acapela’s technologies

  7. Technologies (TTS) • Architecture Text Preprocessor Set of Rules Tagger Dictionary based Phonetizer Phonetic tree + Dictionary Prosody Prosodic Patterns database (Voice) Synthesizer

  8. Text Preprocessor • > Function • Generation of standard text • > Examples • Numbers: 100  one hundred • Currencies: $20  twenty dollars • Abbreviations: tel.  telephone • > Implementation • Rules are defined in a standard format (BNF format) • > Size of data • 20 Kbytes Text Prepro. Tagger Phonetizer Prosody Speech Synth

  9. Tagger (optional) • > Function • Generation of grammatical function of each word • Optional: not necessary for all languages • > Examples • To read – I have read • Les poules du couvent couvent • > Implementation • Dictionary based + set of rules • > Size of data • 0 to 20 Kbytes Text Prepro. Tagger Phonetizer Prosody Speech Synth

  10. Phonetizer • > Function • Generation of phonetic transcription for each word • > Examples • Babel: b a b E l • > Implementation • Decision tree + exception dictionary • > Size of data (language dependent) • 5 to 350 Kbytes Text Prepro. Tagger Phonetizer Prosody Speech Synth

  11. Prosodic module • > Function • Generation of intonation: • Phoneme duration • Pitch markers • > Examples • See MBROLI application • > Implementation • Prosodic patterns extracted from speech corpus • > Size of data (language dependent) • 30 to 300 Kbytes Text Prepro. Tagger Phonetizer Prosody Speech Synth

  12. Text Prepro. Tagger Phonetizer Prosody Speech Synth. Synthesizer • > Function • Generation of speech samples from phoneme sequence + intonation • > Implementation: 3 technologies • Formant-based = rules • Diphone concatenation • Unit Selection • > Size of data: depends on • Technology • Sampling frequency • Compression rate • From 50 Kbytes to 50 Mb

  13. Technologies (ASR) • Speech Recognition • Hybrid Models : Hidden Markov Models/ Neural Networks. Analyse Acoustique Reseau neurones Discrimination Programmation Dynamique (decoder) HMM

  14. Reconnaissance • Vocabulaire • Transcription phonétique Ex: reconnaissance: R [@] k O n E s a~ s • Envisager toutes les transcriptions ! Ex: 10 = dis – diz – di • Envisager les synonymes ! Ex: Oui , ouais, ok, c’est cela, … Ex: Télévision, TV, poste de télévision

  15. Reconnaissance (suite) : difficultés • Bruit • Accents • Hésitations • Utilisateurs • Syntaxe incorrecte • Mots hors vocabulaire

  16. ASR : advantage of NN

  17. Acapela’s product overview

  18. Acapela’s Technologies Overview 3 Technologies • > High-Quality TTS : the pleasant and natural sounding voice • voice enabled by Sayso and BrightSpeech • based on Unit Selection technology • > High-Density TTS : the right choice for high density and small footprints • voice enabled by Tempo and Babil • based on Diphone technology • > ASR : the robust speech recognizer • voice enabled by Babear • Speaker Independent ASR based on Hidden Markov Models and Artificial Neural Networks

  19. High Density TTS Voice enabled by Tempo & Babil Two TTS technologies • Diphone based concatenative TTS • Advantages • Small footprint (2 to 6 Mb) • Flexibility (Pitch, Speed adjustment, prosody copying) • High intelligibility • 21 languages supported Disadvantage : • Less natural sounding • Markets/Application targeted : • Automotive & consumer electronic (low footprint) • High density, short ROI server based TTS (telephony) • Multimedia software products

  20. High Density TTS language availability

  21. High Quality TTSVoice enabled by Sayso & BrightSpeech • Unit selection concatenative TTS • Advantages : • Very high quality • Highly natural • Flexibility (Pitch, Speed adjustment, timber alteration, whispering feature) • Support for Custom voice (“SpeechBrand” Program) Disadvantage : • larger footprint (16 to 70 Mb) • Markets/Application targeted : • High end telephony application (Voice portal, news) • New generation of navigation terminals • Public address

  22. High Quality TTS language availability

  23. ASR Voice enabled by Babear • Hybrid technology of Hidden Markov Models and Artificial Neural Networks • Advantages : • Very high accuracy in difficult contexts • High dialog flexibility, • lip-sync and language learning capabilities thru phoneme level discrimination • Speaker independent • Accurate Voice Activation for noisy environments • Markets/Application targeted : • Industrial Data collection : inventories, picking… • Automotive • Name dialing • Multimedia Command & Control / language learning

  24. ASR language availability

  25. Acapela’s market coverage

  26. Acapela’s Markets Solutions for Telecom, Automotive, Accessibility Mobility, Industry, Multimedia, Consumer Electronics.

  27. Acapela’s Markets Leading 3 major and mature markets Telecom, Automotive, Accessibility

  28. Acapela’s main Markets • TelecomServer based vocalization of contents for multiple users over the phone • for Companies : Unified messaging, Auto attendant, CRM • for Telcos : Unified messaging, Voice portal, SMS2Voice, directory and reverse directory • for Contact centers: call automation, FAQ

  29. Acapela’s main Markets • AutomotiveOn board and off-board speech solutions • On board & Off board car navigation systems • Traffic information • PDA based applications • Telematics

  30. Acapela’s main Markets • Accessibility • Assistive technologies • Screen readers • Reading machines • Voice-controlled mobile phones

  31. Acapela’s Markets Creating new speech markets opportunities in • >> Mobility • Cell phones • Navigation on PDAs

  32. Acapela’s Markets Creating new speech markets opportunities in • >> Industry • Public Address • Alarm & Supervision • Warehousing, Production Line

  33. Acapela’s Markets Creating new speech markets opportunities in • >>Multimedia • Edutainment • Education • Language learning • E-learning

  34. Acapela’s Markets Creating new speech markets opportunities in • >> Consumer Electronics, … • Talking dictionaries devices • Toys

  35. giving you the say

More Related