1 / 18

Auditory User Interfaces: A Comprehensive Overview of Audio Interaction

Explore the potential applications, main technologies, and challenges of auditory user interfaces (AUI) in this comprehensive overview. Discover the benefits of audio interaction and why it has been underused till now. Learn about speech synthesis, speech recognition, and other key components of AUI.

btorres
Download Presentation

Auditory User Interfaces: A Comprehensive Overview of Audio Interaction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimedia Auditory User Interfaces T.Sharon - A.Frank

  2. Auditory User Interfaces • An Auditory user interface (AUI) is an interface which relies primarily or exclusively on audio for interaction, including speech and sound. (Weinschenk & Barker 2000) • Examples: • Natural Language/Speech User Interfaces. • Hands-free automobile navigational system. • Interactive voice response system (IVR) like automated payment center. • Products for visually impaired.

  3. Why Audio I/O? • Hands busy • Eyes engaged • Disabilities T.Sharon - A.Frank

  4. Potential Applications • Auditory Interface can be used in different aspects of our life: • Dictation systems • Navigation systems • Transaction systems • Operator services • Recording meetings and indexing them later on. T.Sharon - A.Frank

  5. Why Audio I/O underused till now? • Needs multiple I/O channels • Cost problems • Technical problems • Algorithmic problems T.Sharon - A.Frank

  6. Audio I/O Main Technologies • Speech synthesis • Speech recognition • Speaker recognition • Non-speech audio T.Sharon - A.Frank

  7. Speech Synthesis • Text-to-Speech • Phoneme-to-Speech • Stored Messages T.Sharon - A.Frank

  8. Basic workflow of Text-to-Speech T.Sharon - A.Frank

  9. Phoneme-to-Speech • Stored phonemes - pre-recorded. • Parameterization (male/female, old/young). • Combined sequence to generate words/sentences. • Synthesizer chip Parameters Stored Phonemes Synthesizer Chip T.Sharon - A.Frank

  10. Stored Messages • Prerecorded parts • Message splicing • How to smooth speech? • Voice playback T.Sharon - A.Frank

  11. Speech Synthesis Timeline T.Sharon - A.Frank

  12. Speech Recognition • Get acoustic patterns (sampling) • Match to templates (map between acoustic patterns to known templates). • Identify tokens T.Sharon - A.Frank

  13. Speech Recognition Problems • Speed talkers • Words swallowing • Speech problems • Slang words (culture oriented) • Words similarity • Environmental noise T.Sharon - A.Frank

  14. Speech Recognition Factors • Speaker (in)dependant • Single voice training • Pre-train/generalize • Vocabulary size • Training cost • Database complexity • Pace of speech • Isolated words • Continuous speech • Connected speech T.Sharon - A.Frank

  15. Factors affecting error rate of speech recognition • Vocabulary size • Background noise • Speech spontaneity • Sampling rate • Amount of training data available T.Sharon - A.Frank

  16. Word Error Rate Conversational Speech 40% X 30% Broadcast News X 20% Read Speech X 10% X Continuous Digits Letters and Numbers Digits X X X Command and Control 0% Level Of Difficulty Word error rate of speech recognition T.Sharon - A.Frank

  17. Basic workflow of Speech-to-Text T.Sharon - A.Frank

  18. Siri as an Example • Siri is an intelligent personal assistant that helps you get things done just by asking. • It allows you to use your voice to send messages, schedule meetings, place phone calls, search the web, and more. • Siri understands your natural speech, and it asks you questions if it needs more information to complete a task. T.Sharon - A.Frank

More Related