Interfaces for Augmenting Conversation

Interfaces for Augmenting Conversation Professor Thad Starner GVU Center, College of Computing Georgia Institute of Technology

Mobile Computing So Far … • Laptops and PDAs move desktop from place to place

Desktop Design Limitations • Interfaces assume full attention • Physically complex point and click • Both eyes • Both hands

What is the NEED while mobile? • During work time • 35-80% in conversation • 7-82% in technical meetings • 14-93% in opportunistic communication • Managers are most likely to be at this high end [Whittaker94, Whittaker95] • Support serendipitous speech

Minimum Attention User Interfaces (MAUIs) • Maximum support for minimum investment of user resources • Attention and Memory • Huge background in cognitive science • Several models • Hard to experiment in mobile setting [Pascoe ISWC98]

Twiddler Keyboard 70 words per minute on mobile phone keypad!

Challenges • Mobile speech very difficult • Noise • Lombard speech • Can only hear user, not other participants (privacy/physics) • Contextual Awareness • Beware “AI-hard” problems • Situation understanding • “User modeling-hard” problems • Mind reading (intentionality)

Example: Jane • Inspired by the character “Jane” in Orson Scott Card’s “Xenocide” • Continuous audio-based agent • Intelligent agent “listens in” to everyday conversations and provides information support • Internet search engines • News • E-mail

Student wore cellular phone with open microphone for 2 weeks, 10am-10pm “Jane” emulated by team of undergrads in 2 hour shifts Results Direct queries: “What was the score of the basketball game?” Pro-activity: “Are you watching TV? Want to know what’s on?” Reminders: “Time for Kung Fu” Wizard of Oz Experiment

Agent and Experimental Design Challenges • Agent audio output was interruptive during conversation • “Agent” could not respond quickly enough • Not enough context to be pro-active • Context could not accumulate due to experimental procedure • Privacy

Privacy: Managing Expectations • Noise cancelling microphones • Hear user, but not other participants • Cut sounds with energy lower than threshold • Speech recognition - store only text • User can repeat crucial information in a socially graceful way • Socially acceptable (at least at a tech school :-) • Video equivalent: Clarkson’s fisheye lens

Personal Audio Recording • 6 month experiment • Speech->text (emacs buffer) using ViaVoice • Augmenting short term memory

Application: Calendaring • One of the most common PDA applications • One of the most desired functions • Occurs routinely in social conversation • One-on-one • Conferences • Meetings • Anecdotal observation of dissatisfaction

Attention and Access: Scheduling Device Survey • What sort of devices are used for scheduling/remembering appointments while mobile? • What are the user’s perceptions of that device? • Why do not more people use these devices/have them with them? • (Georgia Tech GVU TR #02-17; submitted to Trans. Computer Human Interface)

Scheduling Device Survey • 158 subjects • Georgia Tech student center • 90% students; 88% age 18-25; 70% male • Survey • What is your primary scheduling system while mobile? • 8 Likert scale questions on effectiveness, ease of use, speed, and reliablity • Open response questions • Videotape scheduling four appointments

Videotaped Interactions Subject view Scheduling device

Satisfaction • Subjects thought that their device was • Appropriate • Easy to use • Sufficient • Somewhat necessary • Fast to access

Claimed Device Usage

Actual Device Usage

Disuse

Speech Agent Goals • Design scheduling agents for wearables that • Minimize time to retrieve and navigate • Minimize cognitive load • Mimic buffering behavior • Use the user’s social dialog to cue the agent • Speech recognition on unconstrained language • User modeling • Common-sense reasoning • Tools: Restricted grammar and push-to-talk

Calendar Navigator Agent

“Can I see you next week sometime?”

“Let me see if I’m free on the 24th”

“Let me see if I’m free on the 31st” “Yes, 3pm seems like a good time”

“OK, I’ll put “meet Maribeth” at 3pm in my calendar”

Calendar Navigator Agent • Interface used in parallel during conversation when scheduling an appointment • User’s speech performs dual roles: social communication and direction of interface • Might someday be faster than human secretary • High resolution screen for feedback • Not restricted to linear presentation like speech

“Optimal” Tests • Speech operated desktop calendar application • 47% faster than PDAs • 20% faster than paper calendars • Grammar • Move to month/week/day • Monthly/weekly/daily incremental move • Zoom in on week/day • Specify meeting time and participant • Finalize appointment

Dialog Tabs: Augmenting Conversational Memory • Capture conversation for later processing • Low retrieval time and cognitive load • Low impact of speech recognition errors • Enable batch processing • Unable to identify scheduling conflicts directly • Always-visible feedback • Unintrusive during conversation • Continuous reminder of cached information • Quick access and search for processing

Dialog Tabs for Scheduling

Conference/Meeting Applications • “Good to meet you Mr. X …” • “That’s interesting - let me see if I can repeat it back to you …” • “My assistant will….” • “My student should know that …”

Enabling Conversation • Significant populations with communication difficulties: deaf, “locked-in”, etc. • Similar issues • Speed of interaction • <5wpm intolerable • 15wpm handwriting • 20-100wpm typing • >175wpm average speech/sign • Interface is a secondary task

BlinkI: Communicating via eyeblinks

Deaf Community • 28 million deaf and hard-of-hearing - largest disable group in United States • English is a SECOND language; American Sign Language is the first • Interpreters cost $80-$100/hour • Privacy • Scheduling Inconvenience • At-risk population for medical care (HIV)

Machine Translator • Mobile • Controlled by signer • Long battery life • Not interfere with hands • Stylish & cool! • Inexpensive: covered by insurance or ADA

Challenges

ASL “One-Way” Translator • Inspired by DARPA English->Arabic One Way • ASL -> English semi-automatic phrasebook of questions • Answers from English speaker: • Numbers (hold up fingers) • Yes/No • Pointing

Further Constraints: Apartment Hunting Domain • “How many bathrooms?” • “Can I have a pet?” • “Which way to the bedroom?”

Human-Computer Interface • Easier to recognize correct English phrase than generate it • Signer can used head-up display to • Help adjust environment • Constrain vocabulary (“Contact Sign”) • Avoid errors • Goal: > 15 wpm • Writing speed: 15wpm • Spoken English or ASL: 175wpm

Mobile Computer Vision Challenges: • Lighting • 2D information • Compute intensive BUT Advantages • Absolute position • Hand shape • No gloves

Accelerometers • Relative movement • 3D orientation • Low processing • Wide range of motion

Pattern Recognition: Georgia Tech Gesture Toolkit (GT2K) • Toolkit for developing gesture-based recognition systems • Abstracts lower level pattern recognition details • Leverages speech recognition technology (hidden Markov models) to other domains • Based on active speech research tool (HTK, Cambridge University) \

Breaking News: Acceleglove results • Accelerometer/potentiometer glove at GWU • 145 signs; 4 sign phrases • 95% accuracy with part-of-speech grammar • Combined methods? • Time for native signer user studies?

Conclusion • A lot of room for research in • Serendipitous speech • MAUIs • Socially appropriate interfaces, especially for groups • Enabling technology • Must live it to develop it

Acknowledgements • Tracy Westeyn, Helene Brashear, Kent Lyons, Marty McGuire, Daniel Plaisted, Valerie Henderson, Niels Snoeck, Ben Wong, Fleming Seay, Gibby Fusia, the “Jane” undergrad collective, and many other students • Chris Schmandt, Steve Whittaker, Ben Shneiderman, Terry Winograd, and Sharon Oviatt • NSF; NIDRR; DARPA seedling

Resources • IEEE International Symposium on Wearable Computers (ISWC) • www.iswc.net • October, 2004 • Washington, DC • IEEE Wearable Information Systems Technical Committee (computer.org) • Research mailing list: wearables@cc.gatech.edu

Extra Slides

Wearable Computing: a lifestyle and a living experiment

Interfaces for Augmenting Conversation

Interfaces for Augmenting Conversation

Presentation Transcript

Conversation

Augmenting Fire Standpipe Systems

Augmenting WordNet for Deep Understanding of Text

Augmenting Path Algorithm

Somniloquy: Augmenting Network Interfaces to Reduce PC Energy Usage

Hearing Conversation for Musicians

Conversation

Interfaces for atomicity

Augmenting Intellect

Conversation

Augmenting Data Structures

Augmenting AVL trees

Augmenting Data Structures

Augmenting (personal) IR

Augmenting Sprinkler Systems

Augmenting Google Tools

Interfaces for Augmenting Face-to-Face Conversation

Interfaces for atomicity