720 likes | 737 Views
Explore the need for mobile interfaces that support conversations and meetings, and discuss challenges and solutions in developing such interfaces. Includes a case study on a speech agent for scheduling appointments.
E N D
Interfaces for Augmenting Conversation Professor Thad Starner GVU Center, College of Computing Georgia Institute of Technology
Mobile Computing So Far … • Laptops and PDAs move desktop from place to place
Desktop Design Limitations • Interfaces assume full attention • Physically complex point and click • Both eyes • Both hands
What is the NEED while mobile? • During work time • 35-80% in conversation • 7-82% in technical meetings • 14-93% in opportunistic communication • Managers are most likely to be at this high end [Whittaker94, Whittaker95] • Support serendipitous speech
Minimum Attention User Interfaces (MAUIs) • Maximum support for minimum investment of user resources • Attention and Memory • Huge background in cognitive science • Several models • Hard to experiment in mobile setting [Pascoe ISWC98]
Twiddler Keyboard 70 words per minute on mobile phone keypad!
Challenges • Mobile speech very difficult • Noise • Lombard speech • Can only hear user, not other participants (privacy/physics) • Contextual Awareness • Beware “AI-hard” problems • Situation understanding • “User modeling-hard” problems • Mind reading (intentionality)
Example: Jane • Inspired by the character “Jane” in Orson Scott Card’s “Xenocide” • Continuous audio-based agent • Intelligent agent “listens in” to everyday conversations and provides information support • Internet search engines • News • E-mail
Student wore cellular phone with open microphone for 2 weeks, 10am-10pm “Jane” emulated by team of undergrads in 2 hour shifts Results Direct queries: “What was the score of the basketball game?” Pro-activity: “Are you watching TV? Want to know what’s on?” Reminders: “Time for Kung Fu” Wizard of Oz Experiment
Agent and Experimental Design Challenges • Agent audio output was interruptive during conversation • “Agent” could not respond quickly enough • Not enough context to be pro-active • Context could not accumulate due to experimental procedure • Privacy
Privacy: Managing Expectations • Noise cancelling microphones • Hear user, but not other participants • Cut sounds with energy lower than threshold • Speech recognition - store only text • User can repeat crucial information in a socially graceful way • Socially acceptable (at least at a tech school :-) • Video equivalent: Clarkson’s fisheye lens
Personal Audio Recording • 6 month experiment • Speech->text (emacs buffer) using ViaVoice • Augmenting short term memory
Application: Calendaring • One of the most common PDA applications • One of the most desired functions • Occurs routinely in social conversation • One-on-one • Conferences • Meetings • Anecdotal observation of dissatisfaction
Attention and Access: Scheduling Device Survey • What sort of devices are used for scheduling/remembering appointments while mobile? • What are the user’s perceptions of that device? • Why do not more people use these devices/have them with them? • (Georgia Tech GVU TR #02-17; submitted to Trans. Computer Human Interface)
Scheduling Device Survey • 158 subjects • Georgia Tech student center • 90% students; 88% age 18-25; 70% male • Survey • What is your primary scheduling system while mobile? • 8 Likert scale questions on effectiveness, ease of use, speed, and reliablity • Open response questions • Videotape scheduling four appointments
Videotaped Interactions Subject view Scheduling device
Satisfaction • Subjects thought that their device was • Appropriate • Easy to use • Sufficient • Somewhat necessary • Fast to access
Speech Agent Goals • Design scheduling agents for wearables that • Minimize time to retrieve and navigate • Minimize cognitive load • Mimic buffering behavior • Use the user’s social dialog to cue the agent • Speech recognition on unconstrained language • User modeling • Common-sense reasoning • Tools: Restricted grammar and push-to-talk
“Let me see if I’m free on the 31st” “Yes, 3pm seems like a good time”
Calendar Navigator Agent • Interface used in parallel during conversation when scheduling an appointment • User’s speech performs dual roles: social communication and direction of interface • Might someday be faster than human secretary • High resolution screen for feedback • Not restricted to linear presentation like speech
“Optimal” Tests • Speech operated desktop calendar application • 47% faster than PDAs • 20% faster than paper calendars • Grammar • Move to month/week/day • Monthly/weekly/daily incremental move • Zoom in on week/day • Specify meeting time and participant • Finalize appointment
Dialog Tabs: Augmenting Conversational Memory • Capture conversation for later processing • Low retrieval time and cognitive load • Low impact of speech recognition errors • Enable batch processing • Unable to identify scheduling conflicts directly • Always-visible feedback • Unintrusive during conversation • Continuous reminder of cached information • Quick access and search for processing
Conference/Meeting Applications • “Good to meet you Mr. X …” • “That’s interesting - let me see if I can repeat it back to you …” • “My assistant will….” • “My student should know that …”
Enabling Conversation • Significant populations with communication difficulties: deaf, “locked-in”, etc. • Similar issues • Speed of interaction • <5wpm intolerable • 15wpm handwriting • 20-100wpm typing • >175wpm average speech/sign • Interface is a secondary task
Deaf Community • 28 million deaf and hard-of-hearing - largest disable group in United States • English is a SECOND language; American Sign Language is the first • Interpreters cost $80-$100/hour • Privacy • Scheduling Inconvenience • At-risk population for medical care (HIV)
Machine Translator • Mobile • Controlled by signer • Long battery life • Not interfere with hands • Stylish & cool! • Inexpensive: covered by insurance or ADA
ASL “One-Way” Translator • Inspired by DARPA English->Arabic One Way • ASL -> English semi-automatic phrasebook of questions • Answers from English speaker: • Numbers (hold up fingers) • Yes/No • Pointing
Further Constraints: Apartment Hunting Domain • “How many bathrooms?” • “Can I have a pet?” • “Which way to the bedroom?”
Human-Computer Interface • Easier to recognize correct English phrase than generate it • Signer can used head-up display to • Help adjust environment • Constrain vocabulary (“Contact Sign”) • Avoid errors • Goal: > 15 wpm • Writing speed: 15wpm • Spoken English or ASL: 175wpm
Mobile Computer Vision Challenges: • Lighting • 2D information • Compute intensive BUT Advantages • Absolute position • Hand shape • No gloves
Accelerometers • Relative movement • 3D orientation • Low processing • Wide range of motion
Pattern Recognition: Georgia Tech Gesture Toolkit (GT2K) • Toolkit for developing gesture-based recognition systems • Abstracts lower level pattern recognition details • Leverages speech recognition technology (hidden Markov models) to other domains • Based on active speech research tool (HTK, Cambridge University) \
Breaking News: Acceleglove results • Accelerometer/potentiometer glove at GWU • 145 signs; 4 sign phrases • 95% accuracy with part-of-speech grammar • Combined methods? • Time for native signer user studies?
Conclusion • A lot of room for research in • Serendipitous speech • MAUIs • Socially appropriate interfaces, especially for groups • Enabling technology • Must live it to develop it
Acknowledgements • Tracy Westeyn, Helene Brashear, Kent Lyons, Marty McGuire, Daniel Plaisted, Valerie Henderson, Niels Snoeck, Ben Wong, Fleming Seay, Gibby Fusia, the “Jane” undergrad collective, and many other students • Chris Schmandt, Steve Whittaker, Ben Shneiderman, Terry Winograd, and Sharon Oviatt • NSF; NIDRR; DARPA seedling
Resources • IEEE International Symposium on Wearable Computers (ISWC) • www.iswc.net • October, 2004 • Washington, DC • IEEE Wearable Information Systems Technical Committee (computer.org) • Research mailing list: wearables@cc.gatech.edu