270 likes | 489 Views
Speech User Interfaces. Outline. Review Motivation for speech UIs Speech recognition UI problems with speech UIs SpeechActs: Guidelines for speech UIs Speech UI design tools Multimodal UIs. Review. Why do we prototype? get feedback on our design from customers – faster & cheaper
E N D
Outline • Review • Motivation for speech UIs • Speech recognition • UI problems with speech UIs • SpeechActs: Guidelines for speech UIs • Speech UI design tools • Multimodal UIs
Review • Why do we prototype? • get feedback on our design from customers – faster & cheaper • Why use low-fi prototypes? • traditional methods take too long & focus designers & customers on the wrong (visual) issues • What is the Wizard of Oz technique? • faking the interaction • What is the advantage of using informal tools like SILK, DENIM, & SUEDE? • advantages of electronic medium (editing, reuse, distribution, etc.) • faster than traditional UI tools • do not focus designers/customers on the wrong issues • ability to support testing & analysis of resulting data
Information & Services I-Land vision by Streitz, et. al. Motivation for Speech UIs:Pervasive Information Access
I-Land vision by Streitz, et. al. UIs in the Pervasive Computing Era • Future computing devices won’t have the same UI as current PCs • wide range of devices • small or embedded in environment • often w/ “alternative” I/O & w/o screens • information appliances
Read my important email Information Access via Speech
Industry Leaders • Nuance Corporation • Applications: TellMe, … • Users: Government, Computers- Microsoft, IBM,
Speech UI Motivation • Smaller devices -> difficult I/O • people can talk at ~ 90 wpm -> high speed • “Virtually unlimited” set of commands • Freedom for other body parts • imagine you are working on your car & need to know something from the manual • Natural • evolutionarily selected for • reading, writing, & typing are not (too new)
Why are Speech UIs Hard to Get Right? • Speech recognition far from perfect • imagine inputting commands w/ the mouse & getting the wrong result 5-20% of the time • Speech UIs have no visible state • can’t see what you have done before or what affect your commands have had • Speech UIs are hard to learn • how do you explore the interface? how do you find out what you can say?
Speech UIs Require • Speech recognition • the computer understanding what the customer is saying • Speech production (or synthesis) • the computer talking to the customer
Speech Recognition • Continuous vs. non-continuous • Speaker independent vs. dependent • Speech often misunderstood by people • feedback via speech, facial expressions, & gesture • Recognizers trained with real samples • often get gender-based problems • Based on probabilities (HMMs - Bayes) • trigrams of sounds or words • Several popular recognizers • Nuance, SpeechWorks, IBM ViaVoice
Speech Production • Three frequency regions of great intensity visible on oscilloscope • come from larynx, throat, mouth • Two needed for recognition but “tinny” • Can generate emotion affect in speech • Demo • anger, disgust, gladness, sadness, fear, & surprise http://cahn.www.media.mit.edu/people/cahn/emot-speech.html
Recognition Problems • Good recognition • humans < 1% error rate on dictation • top recognition systems get <1-X% error rates • computers don’t use much context • Key is to be application specific for lower error rates • Background noise • even worse recognition rates (20-40% error) • Speed • Better as hardware getting faster • in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAs
More Recognition Problems • Isolated, short words difficult • common words become short • Segmentation • silly versus sill lea • Spelling • mail vs. male -> need to understand language
Speech UI Problems • Speech UI no-nos • modes (no feedback) • certain commands only work when in specific states • deep hierarchies (aka voice mail hell) • Verbose feedback wastes time/patience • only confirm consequential things • use meaningful, short cues • Interruption • half-duplex communication (i.e., no barge-in support) • Too much speech on the part of customer is tiring • Speech takes up space in working memory • can cause problems when problem solving
SpeechActs: Guidelines for Speech UIs • Speech interface to computer tools • email, calendar, weather, stock quotes • Establish common ground & shared context • make sure people know where they are in the conversation • Pacing • recog. delays are unnatural, make it clear when this occurs • barge-in lets user interrupt like in real conversations • tapering of prompts • progressive assistance: short errors messages at first, longer when user needs more help • implicit confirmation: include confirm in next command
Announcements • Task analysis / Contextual inquiry HW • average = 79/100, stdev. 8.4 • Low-fi user test due Monday • questions • If you haven’t gotten a laptop yet, check with Wai-ling after class
SUEDE:Low-fi Prototyping for Speech-based UIs • Supports design practice • example scripts • Wizard of Oz • error simulation • iterative design (design-test-analysis) • Informal user interface • no speech recognition/synthesis • need not be programming expert • fast & fluid design
machine prompt user response
SUEDE Summary • SUEDE supports speech-based UI design • moving from concrete examples to abstractions • allows designer to accept responses that aren’t exactly what they originally had in mind • embeds iterative design w/ design-test-analyze • Designers using SUEDE need not be experts in speech recognition technology
One Vision of Future User Interfaces • Star Trek style UI • verbally ask the computer for information • may be common in mobile/hands-busy situations • problem: hard to design, build, & use! • requires perfect speech recognition & language understanding
Our Vision of Future User Interfaces • Multimodal, Context-aware UIs • multimodal • uses multiple input modalities (speech & gesture) to disambiguate • user says “move it to this screen” while pointing • context-aware • apps can be aware of location, user, what they are doing, … • people are talking -> don’t rely on speech I/O • Problem: how to prototype & test new ideas? • Informal UI Design Tools! • combine Wizard of Oz & informal storyboarding
Multimodal Error Correction • Dictation error correction study • found users are better at correcting recognition errors with a different input modality • recognizer got it wrong the first time -> it will get it wrong the second time • hyperarticulating aggravates • Correct dictation errors with • vocal spelling, writing, typing, etc
Summary • Speech UIs • may permit more natural computer access • allow us to use computers in more situations • are hard to get to work well • lack of visible state, tax working memory, recognition problems, etc. • UI tools are needed for speech UI design • Multimodal UIs address some of the problems with pure speech UIs • help disambiguate • help w/ correction