180 likes | 192 Views
Explore the potential applications, main technologies, and challenges of auditory user interfaces (AUI) in this comprehensive overview. Discover the benefits of audio interaction and why it has been underused till now. Learn about speech synthesis, speech recognition, and other key components of AUI.
E N D
Multimedia Auditory User Interfaces T.Sharon - A.Frank
Auditory User Interfaces • An Auditory user interface (AUI) is an interface which relies primarily or exclusively on audio for interaction, including speech and sound. (Weinschenk & Barker 2000) • Examples: • Natural Language/Speech User Interfaces. • Hands-free automobile navigational system. • Interactive voice response system (IVR) like automated payment center. • Products for visually impaired.
Why Audio I/O? • Hands busy • Eyes engaged • Disabilities T.Sharon - A.Frank
Potential Applications • Auditory Interface can be used in different aspects of our life: • Dictation systems • Navigation systems • Transaction systems • Operator services • Recording meetings and indexing them later on. T.Sharon - A.Frank
Why Audio I/O underused till now? • Needs multiple I/O channels • Cost problems • Technical problems • Algorithmic problems T.Sharon - A.Frank
Audio I/O Main Technologies • Speech synthesis • Speech recognition • Speaker recognition • Non-speech audio T.Sharon - A.Frank
Speech Synthesis • Text-to-Speech • Phoneme-to-Speech • Stored Messages T.Sharon - A.Frank
Basic workflow of Text-to-Speech T.Sharon - A.Frank
Phoneme-to-Speech • Stored phonemes - pre-recorded. • Parameterization (male/female, old/young). • Combined sequence to generate words/sentences. • Synthesizer chip Parameters Stored Phonemes Synthesizer Chip T.Sharon - A.Frank
Stored Messages • Prerecorded parts • Message splicing • How to smooth speech? • Voice playback T.Sharon - A.Frank
Speech Synthesis Timeline T.Sharon - A.Frank
Speech Recognition • Get acoustic patterns (sampling) • Match to templates (map between acoustic patterns to known templates). • Identify tokens T.Sharon - A.Frank
Speech Recognition Problems • Speed talkers • Words swallowing • Speech problems • Slang words (culture oriented) • Words similarity • Environmental noise T.Sharon - A.Frank
Speech Recognition Factors • Speaker (in)dependant • Single voice training • Pre-train/generalize • Vocabulary size • Training cost • Database complexity • Pace of speech • Isolated words • Continuous speech • Connected speech T.Sharon - A.Frank
Factors affecting error rate of speech recognition • Vocabulary size • Background noise • Speech spontaneity • Sampling rate • Amount of training data available T.Sharon - A.Frank
Word Error Rate Conversational Speech 40% X 30% Broadcast News X 20% Read Speech X 10% X Continuous Digits Letters and Numbers Digits X X X Command and Control 0% Level Of Difficulty Word error rate of speech recognition T.Sharon - A.Frank
Basic workflow of Speech-to-Text T.Sharon - A.Frank
Siri as an Example • Siri is an intelligent personal assistant that helps you get things done just by asking. • It allows you to use your voice to send messages, schedule meetings, place phone calls, search the web, and more. • Siri understands your natural speech, and it asks you questions if it needs more information to complete a task. T.Sharon - A.Frank