Speech and multimodal

Speech and multimodal Jesse Cirimele

papers • “Multimodal interaction” Sharon Oviatt • “Designing SpeechActs” Yankelovich et al

Why multimodal? • More transparent, flexible, efficient, and powerfully expressive means of HCI

flexiblility • Modality choice for different situations • Modality choice for different functions • Broader range of users • Broader range of environments

Users prefer multimodal • “For example, 95% to 100% of users preferred to interact multimodally when they were free to use either speech or pen input in a map-based spatial domain (Oviatt, 1997).”

What do you gain? • Some speed and efficiency • Improved error handling • Simpler language used leads to less recognition errors • Mutual disambiguation of different input modes

When do people use multimodal? • Manipulation spatial information • High task difficulty • Communicative complexity

Complementary vs redundancy • Very little redundancy of information • Can’t rely on duplicate information from other modalities, but rather use the strengths of some modes to overcome the weaknesses of others

Multimodal language • Is often linguistically simpler than spoken language • “hard to process disfluent language has been observed to decrease by 50% during multimodal interaction with a map.” • Often different word ordering different • LOC-S-V-O instead of S-C-O-LOC

GUI vs multimodal • GUI • Serial and discrete • Multimodal • Parallel and probabalistic

SpeechActs • user-study style paper • Speech only interface that controls mail, calendar, weather, stock quotes, for traveling professionals.

The study • 22 tasks accomplished via telephone in a room set up to look like a hotel room • Users tested were traveling professionals (same users that would use end system)

results • Users found speechacts promising as a concept and “eagerly awaited improvements”

What would the improve? • In order for Voice User Interfaces (VUI) to be successful they need to create a conversation with the user. • This can be accomplished through • Shared context • When is the right time to input into the system? • Conversation pacing • How can information be shared or skipped at the right speed?

GUI to SUI? • No. it doesn’t make sense to directly translate a GUI experience into a SUI experience. • Instead, take information orgainization and information flow of GUI and build SUI from ground up to accomplish the tasks that the users want to accomplish

Recognition errors • Rejection errors • Find creative ways to get users to repeat input without getting mad • Substitution errors • Confirm some commands • Insertion errors • Turn off mic, same as above

New User Skills • SUIs have different challenges than GUIs • Users need to have different skills • Short term memory • Mental model of system state • Visualizing the organization of information

Conclusions: SUIs • Adhere to principles of conversation • Information must be delivered in a dense fashion for audio output to be fast enough • Immediate and informative feedback on input • Don’t directly translate a GUI into a SUI

Questions: multimodal • Oviatt’s paper gives a lot of benefits to multimodal interaction, why don’t we see many multimodal systems in commercial production • Or do we?

SpeechActs • Does SpeechActs still make sense 10+ years later? • do traveling professionals use these kind of systems now? • Who might benefit from these kinds of systems?

Speech and multimodal

Speech and multimodal

Presentation Transcript

Deep Learning from Speech Analysis/Recognition to Language/Multimodal Processing

MULTIMODAL EMOTION PERCEPTION: ANALOGOUS TO SPEECH PROCESSES

Designing Speech and Multimodal Applications for Seniors

Multimodal Autobiography

Multimodal projects

Deep Learning from Speech Analysis/Recognition to Language/Multimodal Processing

Multimodal Feedback

Multimodal Transport

MULTIMODAL METALANGUAGE

Multimodal Autobiography

Multimodal Therapy

MultiModal Command and Control

Multimodal corpora and speech technology

Text-to-speech Technology: Multimodal Learning, Proofreading, and Struggling Readers

Creating a Multimodal Design Environment Using Speech and Sketching

Multimodal Analysis of Expressive Human Communication: Speech and gesture interplay

Multimodal Transportation Districts and Areawide Multimodal Planning

Collection of multimodal data Face – Speech – Body

Multimodal

Creating a Multimodal Design Environment Using Speech and Sketching

Multimodal Interfaces