Speech User Interfaces

Speech User Interfaces

Outline • Review • Motivation for speech UIs • Speech recognition • UI problems with speech UIs • SpeechActs: Guidelines for speech UIs • Speech UI design tools • Multimodal UIs

Review • Why do we prototype? • get feedback on our design from customers – faster & cheaper • Why use low-fi prototypes? • traditional methods take too long & focus designers & customers on the wrong (visual) issues • What is the Wizard of Oz technique? • faking the interaction • What is the advantage of using informal tools like SILK, DENIM, & SUEDE? • advantages of electronic medium (editing, reuse, distribution, etc.) • faster than traditional UI tools • do not focus designers/customers on the wrong issues • ability to support testing & analysis of resulting data

Information & Services I-Land vision by Streitz, et. al. Motivation for Speech UIs:Pervasive Information Access

I-Land vision by Streitz, et. al. UIs in the Pervasive Computing Era • Future computing devices won’t have the same UI as current PCs • wide range of devices • small or embedded in environment • often w/ “alternative” I/O & w/o screens • information appliances

Read my important email Information Access via Speech

Industry Leaders • Nuance Corporation • Applications: TellMe, … • Users: Government, Computers- Microsoft, IBM,

Speech UI Motivation • Smaller devices -> difficult I/O • people can talk at ~ 90 wpm -> high speed • “Virtually unlimited” set of commands • Freedom for other body parts • imagine you are working on your car & need to know something from the manual • Natural • evolutionarily selected for • reading, writing, & typing are not (too new)

Why are Speech UIs Hard to Get Right? • Speech recognition far from perfect • imagine inputting commands w/ the mouse & getting the wrong result 5-20% of the time • Speech UIs have no visible state • can’t see what you have done before or what affect your commands have had • Speech UIs are hard to learn • how do you explore the interface? how do you find out what you can say?

Speech UIs Require • Speech recognition • the computer understanding what the customer is saying • Speech production (or synthesis) • the computer talking to the customer

Speech Recognition • Continuous vs. non-continuous • Speaker independent vs. dependent • Speech often misunderstood by people • feedback via speech, facial expressions, & gesture • Recognizers trained with real samples • often get gender-based problems • Based on probabilities (HMMs - Bayes) • trigrams of sounds or words • Several popular recognizers • Nuance, SpeechWorks, IBM ViaVoice

Speech Production • Three frequency regions of great intensity visible on oscilloscope • come from larynx, throat, mouth • Two needed for recognition but “tinny” • Can generate emotion affect in speech • Demo • anger, disgust, gladness, sadness, fear, & surprise http://cahn.www.media.mit.edu/people/cahn/emot-speech.html

Recognition Problems • Good recognition • humans < 1% error rate on dictation • top recognition systems get <1-X% error rates • computers don’t use much context • Key is to be application specific for lower error rates • Background noise • even worse recognition rates (20-40% error) • Speed • Better as hardware getting faster • in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAs

More Recognition Problems • Isolated, short words difficult • common words become short • Segmentation • silly versus sill lea • Spelling • mail vs. male -> need to understand language

Speech UI Problems • Speech UI no-nos • modes (no feedback) • certain commands only work when in specific states • deep hierarchies (aka voice mail hell) • Verbose feedback wastes time/patience • only confirm consequential things • use meaningful, short cues • Interruption • half-duplex communication (i.e., no barge-in support) • Too much speech on the part of customer is tiring • Speech takes up space in working memory • can cause problems when problem solving

SpeechActs: Guidelines for Speech UIs • Speech interface to computer tools • email, calendar, weather, stock quotes • Establish common ground & shared context • make sure people know where they are in the conversation • Pacing • recog. delays are unnatural, make it clear when this occurs • barge-in lets user interrupt like in real conversations • tapering of prompts • progressive assistance: short errors messages at first, longer when user needs more help • implicit confirmation: include confirm in next command

SpeechActs Video

Announcements • Task analysis / Contextual inquiry HW • average = 79/100, stdev. 8.4 • Low-fi user test due Monday • questions • If you haven’t gotten a laptop yet, check with Wai-ling after class

SUEDE:Low-fi Prototyping for Speech-based UIs • Supports design practice • example scripts • Wizard of Oz • error simulation • iterative design (design-test-analysis) • Informal user interface • no speech recognition/synthesis • need not be programming expert • fast & fluid design

machine prompt user response

SUEDE Summary • SUEDE supports speech-based UI design • moving from concrete examples to abstractions • allows designer to accept responses that aren’t exactly what they originally had in mind • embeds iterative design w/ design-test-analyze • Designers using SUEDE need not be experts in speech recognition technology

One Vision of Future User Interfaces • Star Trek style UI • verbally ask the computer for information • may be common in mobile/hands-busy situations • problem: hard to design, build, & use! • requires perfect speech recognition & language understanding

Our Vision of Future User Interfaces • Multimodal, Context-aware UIs • multimodal • uses multiple input modalities (speech & gesture) to disambiguate • user says “move it to this screen” while pointing • context-aware • apps can be aware of location, user, what they are doing, … • people are talking -> don’t rely on speech I/O • Problem: how to prototype & test new ideas? • Informal UI Design Tools! • combine Wizard of Oz & informal storyboarding

Multimodal Error Correction • Dictation error correction study • found users are better at correcting recognition errors with a different input modality • recognizer got it wrong the first time -> it will get it wrong the second time • hyperarticulating aggravates • Correct dictation errors with • vocal spelling, writing, typing, etc

Summary • Speech UIs • may permit more natural computer access • allow us to use computers in more situations • are hard to get to work well • lack of visible state, tax working memory, recognition problems, etc. • UI tools are needed for speech UI design • Multimodal UIs address some of the problems with pure speech UIs • help disambiguate • help w/ correction

Speech User Interfaces

Speech User Interfaces

Presentation Transcript

User Interfaces

Graphical User Interfaces

User Interfaces 4

Evaluating User Interfaces

User Interfaces

Graphic User Interfaces

(User) Interfaces

User Interfaces, Debugging, MediaPlayer , Speech

Graphical User Interfaces

Speech User Interfaces

User Interfaces

User Interfaces

User Interfaces

Designing user interfaces

Creating User Interfaces

User Interfaces 4

User Interfaces

Creating User Interfaces

Creating User Interfaces