IBM Research Lab in Haifa

Low footprint high quality Text-to-Speech Ron Hoory Dec 11, 2001 IBM Research Lab in Haifa

Concatenative Text to Speech • Current embedded TTS solutions - • mostly "formant" based - with metallic unnatural sound. • Concatenative TTS • speech segments are selected from a large database and concatenated together. • produces high quality speech, which sounds more natural and human. • currently part of IBM's "Websphere Voice Server" offering. • Implementation for embedded platforms • speech database size (> 50MB) is an obstacle. • HRL is currently working on low footprint concatenative TTS aimed to enable high quality TTS on embedded platforms.

Activity at HRL • Goals: • adapt the concatenative TTS technology developed at IBM T.J. Watson research center to low footprint operation. • speech database size: <5MB. • maintain the same level of quality as the server based TTS. • integrate to IBM's embedded ViaVoice offering. • Means: • speech segments are stored as compressed feature vectors, which may be reconstructed back to speech after concatenation. • novel HRL compression and reconstruction techniques used. • Contact points and IBM partners: • HRL Audio/Video group: Zohar Sivan, Ron Hoory. • IBM Voice Systems: Tom Rutherfoord.

IBM Research Lab in Haifa