James R. Lewis, Ph.D., CHFP IBM Software Group jimlewis@us.ibm August 13, 2012

What Recent Research Says about the Design of Effective Speech Menus -- You Might Be Surprised James R. Lewis, Ph.D., CHFP IBM Software Group jimlewis@us.ibm.com August 13, 2012

Introduction • Which dressing? We’ve got Thousand Island, French, honey mustard, oil and vinegar, raspberry vinaigrette, creamy Italian, ranch, or blue cheese. • Historical role of spoken menus • Recent arguments against using menus in speech recognition IVR apps • However -- speech menus occur during normal human-human dialogs • For foreseeable future crafting effective speech menus will be part of practical VUI design – not a sexy topic, but subtle • Two important design topics • Number of options • Timing for extensions

Number of Options • Common guideline – no more than 4-5 options/menu • Recommendation from earliest days of IVR design • Two key characteristics of a menu • Breadth • Depth • For a given number of options presented in an auditory menu, which is the better strategy? • Fewer menus with more options/menu (Broad) • More menus with fewer options/menu (Deep)

Broad and Deep Menu Structures

Basis for Limiting the Number of Options • Short-term memory limitation (Magic Number 7 ± 2) • Miller’s famous paper from 1956 • Described experiments demonstrating that people have trouble holding more than about 7 (plus or minus 2) items at a time in working memory • The clear application of this for menu design is to limit the number of options presented in a menu – assuming that callers memorize menu options • For a given total number of options, however, restricting the number of options per menu necessarily leads to a deeper rather than a broader menu structure

However … • 1997 - Huguenard, Lurch, Junker, Patz, & Kass • Created a cognitive model phone-based interaction • “It is not the number of options per menu level that determines the magnitude of WM [working memory] load, but rather the amount of processing and storage required to evaluate the ‘goodness’ of each individual option” • Study of 87 participants compared touchtone performance with broad (9x9) and deep (3x3x3x3) menu structures (81 terminals) • Fewer navigation errors in broad condition

What about a “More Options” Choice? • 1997 – Virzi & Huitema • “More Options” choice lists remaining options • 24 participants interacted with broad and deep versions of four different touchtone IVR menu applications, each of which had eight target options • Broad: 8 options • Deep: 4 options + “More Options” link to other 4 • Selection was significantly (10-20 sec) faster with the broad version • “There was clearly no advantage for the deep menus.”

Replication of Virzi & Huitema (1997) • 2001 – Suhm, Freeman, & Getty • Captured data from 2834 calls with a broad menu (7 options) and 2909 calls with a deep version (4+other) • Rates of timeouts and invalid responses for broad and deep versions were comparable • The reprompt rate was significantly higher for the deep version (5.1% for deep, 1.7% for broad) • “Presenting more choices in a menu allows designers of touch-tone voice interfaces to avoid multi-layered menus, which are clearly one of the most dreaded characteristics of touch-tone voice interfaces.”

Are Fewer Options Better when Memory Span is Low? • 2008 – Commarford, Lewis, Al-Awar Smither, & Gentzler • Compared broad and deep versions of a VUI menu • Broad menu contained list of 8–11 options • Deep had one layer of higher-level categories • All participants completed test of working memory • Users of the broad structure IVR performed better (completed more tasks in less time) and were more satisfied than users of the deep-structure IVR • Effect more pronounced for those with low working memory capacity

Commarford et al. (2008), continued • 2008 – Commarford, Lewis, Al-Awar Smither, & Gentzler • Key finding: interaction between menu structure and WMC for task completion time (n = 58)

Model of Auditory Menu Selection (Commarford et al., 2008)

Replication of Commarford et al. (2008) • 2009 – Wolters, Georgila, Moore, Logie, MacPherson, & Watson • WOZ study, n = 49, broad vs. deep voice menus • Assessed participants working memory span • When the application presented more options per turn and avoided explicit confirmations, participants booked appointments more quickly • “Thus, our results complement Commarford et al.’s (2008) finding that users with a lower WMS benefit from being presented with more options at a time, because at each step in the interaction, they are more likely to be presented with the correct choice.”

A Big, Fat (34-Option) Main Menu • 2010 – Hura • You can say. … Company Directory. Computer Assistance. Say Field Support to hear those options. For health benefits you can say Unified Healthcare, Rogers, Dental, … Or, you can say Representative if you know you need a person. • “What we have observed in practice is that frequent users barge in and never hear the menu. Infrequent users who have a term in mind for what they need also barge in without hearing much of the menu. It’s only when the user does not know what word to say that they bother listening to the menu. And in that situation, having a menu is an asset, not a punishment.”

Sometimes the Simple Answer is Hard to See • 2007 – Wilkie, McInnes, Jack, and Littlewood • Activated a “hidden” overdraft option to a banking main menu to avoid increasing menu breadth • Informed callers about the new overdraft option in one of three places: introduction, after caller identification, at the completion of first transaction • 37% of 114 participants failed the overdraft task • “A perhaps obvious solution … would be to simply add an overdraft option to the main menu listing. … This approach was employed in a follow-up experiment … which resulted in all participants successfully obtaining an overdraft. However, adding service options to the main menu in this way is not an ideal solution ... . An alternative method would be to … revisit the wording of the system-initiated proposal.”

Number of Options: Conclusions • No known experimental evidence of improved usability due to shorter menus and deeper structures • 7 studies since 1997 support broad over deep menus • Apparently cognitive demand of navigation exceeds that of selecting option from a menu • Designers must decide how best to organize options • Menu length is not the only factor • Need unambiguous labels and logical option grouping • If the options fall nicely into groups of four or fewer, it is reasonable to organize them in this manner • Do not artificially limit the number of options • Prefer broad over deep menu structures

Timing Extensions to Menus and Prompts • Sometimes designers provide extensions to menus and prompts (Weegels, 2000) • Which would you like? Intake Interviews, Genetic Testing, Court Date Info, or Upcoming Appointments? <pause> Or say, “It’s none of these.” • How long should the pre-extension pause be? • If much too short, not an effective turntaking cue • If slightly too short, deceptive turntaking cue that interrupts caller and can cause stuttering effect • If much too long, callers who are not sure how to respond may begin guessing – and never hear the extension • Timing is critical, but typically underspecified

Findings from Analyses of Conversations • Pauses in a face-to-face setting rarely last more than 1 second (Clark, 1996; Wilson & Zimmerman, 1986) • When a participant in a telephone conversation pauses longer than 1 second and the other participant does not take the turn, the first speaker usually interprets this as a problem (Roberts, Francis, & Morgan, 2006) • The mean pause duration for dialog turns in service-based telephone conversations was 426 ms, with a 95% confidence interval ranging from 264–588 ms, an estimated 95th percentile of 1010 ms, and an estimated 99th percentile of 1303 ms (Beattie & Barnard, 1979; Lewis, 2011)

Findings from Analyses of Interactions with IVRs • 100 survey respondents provided information about how they knew when it was their turn to speak (Margulies, 2005) • Pause in the dialog (41.6%), prompt syntax (26.2%), inflection (21.0%), and earcons (11.2%) • Analysis of several dozen people using speech IVR – poorly timed pauses caused 18.5% of observed task failures (Margulies, 2005) • “when the machine seemingly yields a turn but then continues with instructions or a repeat of the declarative or interrogatory prompt—coincident with the subject either preparing to respond or in the act of responding”

Findings from Analyses of Interactions with IVRs • Commarford & Lewis, 2005 • Detailed analysis of six callers at task-terminal points in the completion of tasks with two different speech IVRs • Goal to find the optimal pause between presentation of a menu at a task-terminal point and the presentation of global navigation commands as an extension • Interesting differences in the distributions of caller response latencies as a function of whether the terminal menu included or did not include the target option for the task and whether it was the first or a subsequent presentation of the menu to the caller

Findings from Analyses of Interactions with IVRs • Commarford & Lewis, 2005

Timing Extensions to Menus and Prompts: Conclusions • Analyses of human-human conversations indicate: • Turntaking pauses should be at least 1 sec in duration • Longer pauses (1300 ms) provide better turntaking cues • 500 ms pause likely to cause conversational collisions • 250 ms pause not likely to trigger turntaking • Analyses of human-IVR interaction indicate: • A 2000 ms pause should balance tradeoffs • Try to design menus/prompts that do not need extension • Need for more research (open-ended prompt, familiarity)

Questions?

References • Beattie, G. W., & Barnard, P. J. (1979). The temporal structure of natural telephone conversations (directory enquiry calls). Linguistics, 17, 213–229. • Clark, H. H. (1996). Using language. Cambridge, UK: Cambridge University Press. • Commarford, P. M., & Lewis, J. R. (2005). Optimizing the pause length before presentation of global navigation commands. In Proceedings of HCI International 2005: Volume 2—The management of information: E-business, the Web, and mobile computing (pp. 1–7). St. Louis, MO: Mira Digital Publication. • Commarford, P. M., Lewis, J. R., Al-Awar Smither, J. & Gentzler, M. D. (2008). A comparison of broad versus deep auditory menu structures. Human Factors, 50(1), 77–89. • Huguenard, B. R., Lurch, F. J., Junker, B. W., Patz, R. J., & Kass, R. E. (1997). Working memory failure in phone-based interaction. ACM Transactions on Computer-Human Interaction, 4(2), 67–102. • Hura, S. L. (2010). My big fat main menu: The case for strategically breaking the rules. In W. Meisel (Ed.), Speech in the user interface: Lessons from experience (pp. 113–116). Victoria, Canada: TMA Associates. • Lewis, J. R. (2011). Practical speech user interface design. Boca Raton, FL: Taylor & Francis. • Margulies, E. (2005). Adventures in turn-taking: Notes on success and failure in turn cue coupling. In AVIOS 2005 proceedings (pp. 1–10). San Jose, CA: AVIOS. • Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review, 63, 81–97.

References • Roberts, F., Francis, A. L., & Morgan, M. (2006). The interaction of inter-turn silence with prosodic cues in listener perceptions of “trouble” in conversation. Speech Communication, 48, 1079–1093. • Suhm, B., Freeman, B., & Getty, D. (2001). Curing the menu blues in touch-tone voice interfaces. In Proceedings of CHI 2001 (pp. 131–132). The Hague, Netherlands: ACM. • Virzi, R. A., & Huitema, J. S. (1997). Telephone-based menus: Evidence that broader is better than deeper. In Proceedings of the Human Factors and Ergonomics Society 41st Annual Meeting (pp. 315–319). Santa Monica, CA: Human Factors and Ergonomics Society. • Weegels, M. F. (2000). Users’ conceptions of voice-operated information services. International Journal of Speech Technology, 3, 75–82. • Wilkie, J., McInnes, F., Jack, M. A., & Littlewood, P. (2007). Hidden menu options in automated human-computer telephone dialogues: Dissonance in the user’s mental model. Behaviour & Information Technology, 26(6), 517–534. • Wilson, T. P., & Zimmerman, D. H. (1986). The structure of silence between turns in two-party conversation. Discourse Processes, 9, 375–390. • Wolters, M., Georgila, K., Moore, J. D., Logie, R. H., MacPherson, S. E., & Watson, M. (2009). Reducing working memory load in spoken dialogue systems. Interacting with Computers, 21, 276–287.

James R. Lewis, Ph.D., CHFP IBM Software Group jimlewis@us.ibm August 13, 2012

James R. Lewis, Ph.D., CHFP IBM Software Group jimlewis@us.ibm August 13, 2012

Presentation Transcript

James W. Johnson (Jim) IBM Global Business Services 248-703-7506 jwjohn@us.ibm

IBM Software Group

James R. Wilbanks, Ph.D. Executive Secretary

James R. Holt, Ph.D., PE. Jholt@wsu.edu

Monday, August 13, 2012

August 13-17, 2012

James Lewis Mann

Martha Lewis Blum, M.D. Ph.D. August 20, 2013

Andreas Weininger IBM Software Group

James R. Wilbanks, Ph.D. Executive Secretary

IBM Software Group

IBM Software Group

IBM Software Group

IBM Software Group

Bill.Hahn @us.ibm IBM Sr. Consulting Developer/Architect

James R. Holt, Ph.D., PE. Jholt@wsu

IBM Software Group

IBM Software Group

Bill.Hahn @us.ibm IBM Sr. Consulting Developer/Architect