1 / 72

Interfaces for Augmenting Conversation

Explore the need for mobile interfaces that support conversations and meetings, and discuss challenges and solutions in developing such interfaces. Includes a case study on a speech agent for scheduling appointments.

gragg
Download Presentation

Interfaces for Augmenting Conversation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interfaces for Augmenting Conversation Professor Thad Starner GVU Center, College of Computing Georgia Institute of Technology

  2. Mobile Computing So Far … • Laptops and PDAs move desktop from place to place

  3. Desktop Design Limitations • Interfaces assume full attention • Physically complex point and click • Both eyes • Both hands

  4. What is the NEED while mobile? • During work time • 35-80% in conversation • 7-82% in technical meetings • 14-93% in opportunistic communication • Managers are most likely to be at this high end [Whittaker94, Whittaker95] • Support serendipitous speech

  5. Minimum Attention User Interfaces (MAUIs) • Maximum support for minimum investment of user resources • Attention and Memory • Huge background in cognitive science • Several models • Hard to experiment in mobile setting [Pascoe ISWC98]

  6. Twiddler Keyboard 70 words per minute on mobile phone keypad!

  7. Challenges • Mobile speech very difficult • Noise • Lombard speech • Can only hear user, not other participants (privacy/physics) • Contextual Awareness • Beware “AI-hard” problems • Situation understanding • “User modeling-hard” problems • Mind reading (intentionality)

  8. Example: Jane • Inspired by the character “Jane” in Orson Scott Card’s “Xenocide” • Continuous audio-based agent • Intelligent agent “listens in” to everyday conversations and provides information support • Internet search engines • News • E-mail

  9. Student wore cellular phone with open microphone for 2 weeks, 10am-10pm “Jane” emulated by team of undergrads in 2 hour shifts Results Direct queries: “What was the score of the basketball game?” Pro-activity: “Are you watching TV? Want to know what’s on?” Reminders: “Time for Kung Fu” Wizard of Oz Experiment

  10. Agent and Experimental Design Challenges • Agent audio output was interruptive during conversation • “Agent” could not respond quickly enough • Not enough context to be pro-active • Context could not accumulate due to experimental procedure • Privacy

  11. Privacy: Managing Expectations • Noise cancelling microphones • Hear user, but not other participants • Cut sounds with energy lower than threshold • Speech recognition - store only text • User can repeat crucial information in a socially graceful way • Socially acceptable (at least at a tech school :-) • Video equivalent: Clarkson’s fisheye lens

  12. Personal Audio Recording • 6 month experiment • Speech->text (emacs buffer) using ViaVoice • Augmenting short term memory

  13. Application: Calendaring • One of the most common PDA applications • One of the most desired functions • Occurs routinely in social conversation • One-on-one • Conferences • Meetings • Anecdotal observation of dissatisfaction

  14. Attention and Access: Scheduling Device Survey • What sort of devices are used for scheduling/remembering appointments while mobile? • What are the user’s perceptions of that device? • Why do not more people use these devices/have them with them? • (Georgia Tech GVU TR #02-17; submitted to Trans. Computer Human Interface)

  15. Scheduling Device Survey • 158 subjects • Georgia Tech student center • 90% students; 88% age 18-25; 70% male • Survey • What is your primary scheduling system while mobile? • 8 Likert scale questions on effectiveness, ease of use, speed, and reliablity • Open response questions • Videotape scheduling four appointments

  16. Videotaped Interactions Subject view Scheduling device

  17. Satisfaction • Subjects thought that their device was • Appropriate • Easy to use • Sufficient • Somewhat necessary • Fast to access

  18. Claimed Device Usage

  19. Actual Device Usage

  20. Disuse

  21. Speech Agent Goals • Design scheduling agents for wearables that • Minimize time to retrieve and navigate • Minimize cognitive load • Mimic buffering behavior • Use the user’s social dialog to cue the agent • Speech recognition on unconstrained language • User modeling • Common-sense reasoning • Tools: Restricted grammar and push-to-talk

  22. Calendar Navigator Agent

  23. “Can I see you next week sometime?”

  24. “Let me see if I’m free on the 24th”

  25. “Let me see if I’m free on the 31st” “Yes, 3pm seems like a good time”

  26. “OK, I’ll put “meet Maribeth” at 3pm in my calendar”

  27. Calendar Navigator Agent • Interface used in parallel during conversation when scheduling an appointment • User’s speech performs dual roles: social communication and direction of interface • Might someday be faster than human secretary • High resolution screen for feedback • Not restricted to linear presentation like speech

  28. “Optimal” Tests • Speech operated desktop calendar application • 47% faster than PDAs • 20% faster than paper calendars • Grammar • Move to month/week/day • Monthly/weekly/daily incremental move • Zoom in on week/day • Specify meeting time and participant • Finalize appointment

  29. Dialog Tabs: Augmenting Conversational Memory • Capture conversation for later processing • Low retrieval time and cognitive load • Low impact of speech recognition errors • Enable batch processing • Unable to identify scheduling conflicts directly • Always-visible feedback • Unintrusive during conversation • Continuous reminder of cached information • Quick access and search for processing

  30. Dialog Tabs for Scheduling

  31. Conference/Meeting Applications • “Good to meet you Mr. X …” • “That’s interesting - let me see if I can repeat it back to you …” • “My assistant will….” • “My student should know that …”

  32. Enabling Conversation • Significant populations with communication difficulties: deaf, “locked-in”, etc. • Similar issues • Speed of interaction • <5wpm intolerable • 15wpm handwriting • 20-100wpm typing • >175wpm average speech/sign • Interface is a secondary task

  33. BlinkI: Communicating via eyeblinks

  34. Deaf Community • 28 million deaf and hard-of-hearing - largest disable group in United States • English is a SECOND language; American Sign Language is the first • Interpreters cost $80-$100/hour • Privacy • Scheduling Inconvenience • At-risk population for medical care (HIV)

  35. Machine Translator • Mobile • Controlled by signer • Long battery life • Not interfere with hands • Stylish & cool! • Inexpensive: covered by insurance or ADA

  36. Challenges

  37. ASL “One-Way” Translator • Inspired by DARPA English->Arabic One Way • ASL -> English semi-automatic phrasebook of questions • Answers from English speaker: • Numbers (hold up fingers) • Yes/No • Pointing

  38. Further Constraints: Apartment Hunting Domain • “How many bathrooms?” • “Can I have a pet?” • “Which way to the bedroom?”

  39. Human-Computer Interface • Easier to recognize correct English phrase than generate it • Signer can used head-up display to • Help adjust environment • Constrain vocabulary (“Contact Sign”) • Avoid errors • Goal: > 15 wpm • Writing speed: 15wpm • Spoken English or ASL: 175wpm

  40. Mobile Computer Vision Challenges: • Lighting • 2D information • Compute intensive BUT Advantages • Absolute position • Hand shape • No gloves

  41. Accelerometers • Relative movement • 3D orientation • Low processing • Wide range of motion

  42. Pattern Recognition: Georgia Tech Gesture Toolkit (GT2K) • Toolkit for developing gesture-based recognition systems • Abstracts lower level pattern recognition details • Leverages speech recognition technology (hidden Markov models) to other domains • Based on active speech research tool (HTK, Cambridge University) \

  43. Breaking News: Acceleglove results • Accelerometer/potentiometer glove at GWU • 145 signs; 4 sign phrases • 95% accuracy with part-of-speech grammar • Combined methods? • Time for native signer user studies?

  44. Conclusion • A lot of room for research in • Serendipitous speech • MAUIs • Socially appropriate interfaces, especially for groups • Enabling technology • Must live it to develop it

  45. Acknowledgements • Tracy Westeyn, Helene Brashear, Kent Lyons, Marty McGuire, Daniel Plaisted, Valerie Henderson, Niels Snoeck, Ben Wong, Fleming Seay, Gibby Fusia, the “Jane” undergrad collective, and many other students • Chris Schmandt, Steve Whittaker, Ben Shneiderman, Terry Winograd, and Sharon Oviatt • NSF; NIDRR; DARPA seedling

  46. Resources • IEEE International Symposium on Wearable Computers (ISWC) • www.iswc.net • October, 2004 • Washington, DC • IEEE Wearable Information Systems Technical Committee (computer.org) • Research mailing list: wearables@cc.gatech.edu

  47. Extra Slides

  48. Wearable Computing: a lifestyle and a living experiment

More Related