Auditory User Interfaces: A Comprehensive Overview of Audio Interaction

Multimedia Auditory User Interfaces T.Sharon - A.Frank

Auditory User Interfaces • An Auditory user interface (AUI) is an interface which relies primarily or exclusively on audio for interaction, including speech and sound. (Weinschenk & Barker 2000) • Examples: • Natural Language/Speech User Interfaces. • Hands-free automobile navigational system. • Interactive voice response system (IVR) like automated payment center. • Products for visually impaired.

Why Audio I/O? • Hands busy • Eyes engaged • Disabilities T.Sharon - A.Frank

Potential Applications • Auditory Interface can be used in different aspects of our life: • Dictation systems • Navigation systems • Transaction systems • Operator services • Recording meetings and indexing them later on. T.Sharon - A.Frank

Why Audio I/O underused till now? • Needs multiple I/O channels • Cost problems • Technical problems • Algorithmic problems T.Sharon - A.Frank

Audio I/O Main Technologies • Speech synthesis • Speech recognition • Speaker recognition • Non-speech audio T.Sharon - A.Frank

Speech Synthesis • Text-to-Speech • Phoneme-to-Speech • Stored Messages T.Sharon - A.Frank

Basic workflow of Text-to-Speech T.Sharon - A.Frank

Phoneme-to-Speech • Stored phonemes - pre-recorded. • Parameterization (male/female, old/young). • Combined sequence to generate words/sentences. • Synthesizer chip Parameters Stored Phonemes Synthesizer Chip T.Sharon - A.Frank

Stored Messages • Prerecorded parts • Message splicing • How to smooth speech? • Voice playback T.Sharon - A.Frank

Speech Synthesis Timeline T.Sharon - A.Frank

Speech Recognition • Get acoustic patterns (sampling) • Match to templates (map between acoustic patterns to known templates). • Identify tokens T.Sharon - A.Frank

Speech Recognition Problems • Speed talkers • Words swallowing • Speech problems • Slang words (culture oriented) • Words similarity • Environmental noise T.Sharon - A.Frank

Speech Recognition Factors • Speaker (in)dependant • Single voice training • Pre-train/generalize • Vocabulary size • Training cost • Database complexity • Pace of speech • Isolated words • Continuous speech • Connected speech T.Sharon - A.Frank

Factors affecting error rate of speech recognition • Vocabulary size • Background noise • Speech spontaneity • Sampling rate • Amount of training data available T.Sharon - A.Frank

Word Error Rate Conversational Speech 40% X 30% Broadcast News X 20% Read Speech X 10% X Continuous Digits Letters and Numbers Digits X X X Command and Control 0% Level Of Difficulty Word error rate of speech recognition T.Sharon - A.Frank

Basic workflow of Speech-to-Text T.Sharon - A.Frank

Siri as an Example • Siri is an intelligent personal assistant that helps you get things done just by asking. • It allows you to use your voice to send messages, schedule meetings, place phone calls, search the web, and more. • Siri understands your natural speech, and it asks you questions if it needs more information to complete a task. T.Sharon - A.Frank

Auditory User Interfaces: A Comprehensive Overview of Audio Interaction

Auditory User Interfaces: A Comprehensive Overview of Audio Interaction

Presentation Transcript

Multimedia

Multimedia

Multimedia

Multimedia

MULTIMEDIA

Multimedia

Multimedia

Multimedia

Multimedia

Multimedia

MULTIMEDIA

Multimedia

Multimedia

Multimedia

Multimedia

Multimedia