190 likes | 337 Views
Rhetorical Group plc. Marc Moens January 2001. r. Focus of Rhetorical Group. Speech Synthesis Producing natural sounding speech Talking computers Voices modelled on human voices and often almost indistinguishable from the original voice Different from speech recognition
E N D
Rhetorical Group plc Marc Moens January 2001 r
Focus of Rhetorical Group • Speech Synthesis • Producing natural sounding speech • Talking computers • Voices modelled on human voices and often almost indistinguishable from the original voice • Different from speech recognition • Used in dictation, voice-controlled operating systems • Different companies, targeting different markets • Core product: rVoice r
Technological breakthrough • Old technology: • Formant-based synthesis • Diphone-based synthesis • Lack vitality • Monotonic • Not suitable for extended use r
Technological breakthrough • rVoice: • Unit selection • More natural sounding • Suitable for extended use • In many applications: almost indistinguishable from a human voice • “Welcome to our new speech synthesiser.” r
Speech synthesis • Human voice vs Synthesised Voice • Under controlled conditions • Mixing the human voice with the synthesised voice • “Previously he was vice president of Eastern Edison.” • “Mrs Hill said many of the 25 countries that she placed under varying degrees of scrutiny had made genuine progress on this touchy issue.” r
Senior Management: Marc Moens (CEO) Paul Taylor (CTO) Peter Denyer (Chairman) Other management: Keith Edwards (applications manager) Ian Hodson (product development manager) Art Blokland (consultancy manager) Full Team: 35 people Rhetorical Team
rVoice outlook • A variety of applications and platforms: • Telephony industry • Games • Internet • Mobile communications • A variety of input mechanisms • Text (TTS) • Concept to speech – in conjunction with language generation • Domain specific applications • A variety of voices and languages • rVoice rapid voice prototyper allows new voices to be added to the system in a matter of weeks • Different accents and languages covered • All within a single generic system
rVoice core capabilities: domain specific synthesis • Flexible, scalable domain specific synthesis • Airline information • Car directions • Financial news
rVoice core capabilities: multi-linguality • Currently only English available • Plans: • German and French by Q2 2001 • Spanish Q3 • Dutch and Italian Q4 • Same engine for all languages
rVoice core capabilities: text analysis • Robust statistical: • Text normalisation ($1.43 > one dollar forty three cents) • POS tagging • Phrase break prediction • Letter-to-sound rule transduction (including automatic training) • Syntactic parsing
Talking Heads • Ongoing work on rFace • Ability to capture 3D model of any head, and combine it with speech
System Overview • Two systems: • rVoice developer • single user stand alone system with scripting language and graphical tools • rVoice run • compact fast run-time system, multithreaded. Client server architecture and telephony hardware communications
Current Platforms • Solaris 2.5, 2.5.1, 2.6, 2.7: FreeBSD 2.2, 3.x • Linux (Redhat 4.1, 5.0, 5.1, 5.2, 6.0 and other Linux distributions), OSF (Dec Alphas) SGI (Irix), HPs (HPUX). • Windows 95, 98, NT 4.0, 2000: Visual C++ v5.0 and v6.0
Speed and Size • rVoice 1.0 aims: • 10 simultaneous channels on Pentium 1GHz • 256M Ram of which • 15M taken up by each channel • 75M of shared resource • Higher number of channels available with proportionate voice quality reduction
System Development Schedule • Basic Prototype • January 31, 2001 • Alpha release • February 28, 2001 (single thread) • Beta release • April 15, 2001 (multi-threaded) • Full release • May, June 2001
Development Schedule: capabilities • First basic British voice: • 6th December 2000 • Five British voices: • end January 2001 • Five American voices: • mid February 2001 • VoiceXML • end January 2001 • Fast unit selection • end January 2001
Future Plans • Extension to new platforms including games and mobile devices • Development and integration with • language generation • information extraction and retrieval
Contact Peter Denyer Rhetorical Group plc 4, Buccleuch Place Edinburgh EH8 9LW Tel : 07770 416 699 Fax: 0131 650 4587 Email: P_Denyer@yahoo.com Marc Moens Rhetorical Group plc 4, Buccleuch Place Edinburgh EH8 9LW Tel : 0131 650 4427 07979 596770 Fax: 0131 650 4587 Email: marc@rhetoricalsystems.com r