1 / 20

Speech and multimodal

Speech and multimodal. Jesse Cirimele. papers. “Multimodal interaction” Sharon Oviatt “Designing SpeechActs” Yankelovich et al. Why multimodal?. More transparent, flexible, efficient, and powerfully expressive means of HCI. flexiblility. Modality choice for different situations

mave
Download Presentation

Speech and multimodal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech and multimodal Jesse Cirimele

  2. papers • “Multimodal interaction” Sharon Oviatt • “Designing SpeechActs” Yankelovich et al

  3. Why multimodal? • More transparent, flexible, efficient, and powerfully expressive means of HCI

  4. flexiblility • Modality choice for different situations • Modality choice for different functions • Broader range of users • Broader range of environments

  5. Users prefer multimodal • “For example, 95% to 100% of users preferred to interact multimodally when they were free to use either speech or pen input in a map-based spatial domain (Oviatt, 1997).”

  6. What do you gain? • Some speed and efficiency • Improved error handling • Simpler language used leads to less recognition errors • Mutual disambiguation of different input modes

  7. When do people use multimodal? • Manipulation spatial information • High task difficulty • Communicative complexity

  8. Complementary vs redundancy • Very little redundancy of information • Can’t rely on duplicate information from other modalities, but rather use the strengths of some modes to overcome the weaknesses of others

  9. Multimodal language • Is often linguistically simpler than spoken language • “hard to process disfluent language has been observed to decrease by 50% during multimodal interaction with a map.” • Often different word ordering different • LOC-S-V-O instead of S-C-O-LOC

  10. GUI vs multimodal • GUI • Serial and discrete • Multimodal • Parallel and probabalistic

  11. SpeechActs • user-study style paper • Speech only interface that controls mail, calendar, weather, stock quotes, for traveling professionals.

  12. The study • 22 tasks accomplished via telephone in a room set up to look like a hotel room • Users tested were traveling professionals (same users that would use end system)

  13. results • Users found speechacts promising as a concept and “eagerly awaited improvements”

  14. What would the improve? • In order for Voice User Interfaces (VUI) to be successful they need to create a conversation with the user. • This can be accomplished through • Shared context • When is the right time to input into the system? • Conversation pacing • How can information be shared or skipped at the right speed?

  15. GUI to SUI? • No. it doesn’t make sense to directly translate a GUI experience into a SUI experience. • Instead, take information orgainization and information flow of GUI and build SUI from ground up to accomplish the tasks that the users want to accomplish

  16. Recognition errors • Rejection errors • Find creative ways to get users to repeat input without getting mad • Substitution errors • Confirm some commands • Insertion errors • Turn off mic, same as above

  17. New User Skills • SUIs have different challenges than GUIs • Users need to have different skills • Short term memory • Mental model of system state • Visualizing the organization of information

  18. Conclusions: SUIs • Adhere to principles of conversation • Information must be delivered in a dense fashion for audio output to be fast enough • Immediate and informative feedback on input • Don’t directly translate a GUI into a SUI

  19. Questions: multimodal • Oviatt’s paper gives a lot of benefits to multimodal interaction, why don’t we see many multimodal systems in commercial production • Or do we?

  20. SpeechActs • Does SpeechActs still make sense 10+ years later? • do traveling professionals use these kind of systems now? • Who might benefit from these kinds of systems?

More Related