120 likes | 249 Views
Distributed Rendering Tool for Voices (DRTV). Familiar, Expressive Voices & Personalities Speech Technology & Media Solutions By Dale Schalow SCHALOW Innovations Ashburn, Virginia (USA). DRTV Goals.
E N D
Distributed Rendering Tool for Voices (DRTV) Familiar, Expressive Voices & Personalities Speech Technology & Media Solutions By Dale Schalow SCHALOW Innovations Ashburn, Virginia (USA)
DRTV Goals • Professionally Design, Produce, Develop Familiar-sounding Voices from today, tomorrow and the past • Provide Always-On Service to Consumers, Businesses and Government • Provided for Interactive and Linear Media Users as a Hosted Solution (Client/Server)
Description • High-quality voices for use in Internet and Content. • Managing Assets with New and Historic Sources.
Description • High-quality voices for use in Internet and Content. • Entertainment and Education • 3D animation, gaming • Film, TV, radio • Accessibility • Seniors • Low Vision • Motor-Impaired
Description • Build and Manage Speech Assets: • Establish formal voice asset collection, storage and distribution • Facilitate asset preservation and restoration • Coordinate with Museums, Libraries, 3D Game/Film Studios, Radio, Foundations, Colleges, etc
Description • Build and Manage Assets: • Refactor inventory for both audio and audio-visual physical assets (tapes, digital, reels, master sound recordings) • Maintain digital asset libraries • Maintain product voice library with dictionary of terms (paired vocabulary) • Coordinate asset management IS/IT needs and initiatives with customer or partnering group
Technology • New media technology used • NLP Toolkit (Natural Language Processing) • Cross-Encoding for Embedded Media (PCs, HD, AAC, MP3/Internet Radio, etc) • Standards being adopted • W3C (World-Wide Web Consortium) • Java™ and VoiceXML, SSML (Speech Synthesis Markup Language)
Team/Resources • Resources allocated to this project • Support & outside services • Internal software development • Internet Service Provider • Pro Recording Studios • 3rd party vendors (hardware/software)
Speech Tech Procedures • Step 1 - New Voice as Source? • Professionally Record using N-based “tape script” • Output format as PCM (e.g. Wave 1-channel 16 bit) • Step 2 - Existing Voice as Source • Import audio source (PCM/16 bit quality) • “Auto-Extract” using N-based “tape script” to pull phonetic-features phonemes and transcriptions • Audio scanning with automatically generated text-based grammars • Retaining audio output
Speech Tech Procedures • Step 3 - Apply Vocabulary • Build a default dictionary of terms to allow automatic translation • Minimum 40k words (ideally more is better) • Step 4 - Process Text-to-Speech (TTS) • Take as input some text (e.g. “hello”) • Use the speech synthesis engine to generate audio with the applied vocabulary • Step 5 - Use the URL/file of the generated voice from Step 4 for vertical application (Web page, game, 3D import, etc)
Speech Tech Procedures • Benefits • Reduces time and manual effort to re-do fundamental tasks • Achieved high-quality output • Moving things forward on at least two-fronts • 1) Voices we already know or recognize • 2) Voices and creations we are yet to discover in the process • Appeals to many demographics for marketability
DRTV Contact Information • For more information: SCHALOW Innovations Dale B. Schalow Phone: (703) 625-7367 Email: dale@schalow.com Web: http://schalow.com