Error Detection in Human-Machine Interaction

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Errors in Spoken-Language Interfaces • Speech Recognition is problematic: • Input signal quality • Accents, Non-native speakers • Spoken language disfluencies: stutters, false-starts, /mm/, /um/ • Typical Word Error Rates in SDS: 10-30% • Systems today lack the ability to gracefully recover from error

An example S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO]

Pathway to a solution • Make systems aware of unreliability in their inputs • Confidence scores • Develop a model which learns to optimally choose between several prevention/repair strategies • Identify strategies • Express them in a computable manner • Develop the model

Papers • Error Detection in Spoken Human-Machine Interaction[E.Krahmer, M. Swerts, M. Theune, M. Weegels] • Problem Spotting in Human-Machine Interaction[E.Krahmer, M. Swerts, M. Theune, M. Weegels] • The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates[E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Goals • [Let’s look at dialog on page 2] • (1) Analysis of positive an negative cues we use in response to implicit and explicit verification questions • (2) Explore the possibilities of spotting errors on line

Explicit vs. Implicit • Explicit • Presumably easier for the system to verify • But there’s evidence that it’s not as easy … • Leads to more turns, less efficiency, frustration • Implicit • Efficiency • But induces a higher cognitive burden which can result in more confusion • ~ Systems don’t deal very well with it…

Clarke & Schaeffer • Grounding model • Presentation phase • Acceptance phase • Various indicators • Go ON / YES • Go BACK / NO • Can we detect them reliably (when following implicit and explicit verification questions) ?

Positive and Negative Cues

Experimental Setup / Data • 120 dialogs : Dutch SDS providing train timetable information • 487 utterances • 44 (~10%) not used • Users accepting a wrong result • Barge-in • Users starting their own contribution • Left 443 resulting adjacent S/U utterances

Results – Nr of words

Results – Empty turns (%)

Results – Marked word order %

Results – Yes/No

Results – Repeated/Corrected/New

First conclusion • People use more negative cues when there are problems • And even more so for implicit confirmations (vs. explicit ones)

How well can you classify • Using individual features • Look at precision/recall • Explicit: absence of confirmation • Implicit: non-zero number of corrections • Multiple features • Used memory based learning • 97% accuracy (maj. Baseline 68%) • Confirm + Correct is winning, although individually less good • This is overall, right ? How about for explicit vs. implicit ?

BUT !!! • How many of these features are available on-line?

What else can we throw at it ? • Prosody (next paper) • Lexical information • Acoustic confidence scores • Maybe also of previous utterances • Repetitions/Corrections/New info on transcript ? • … • …

Papers • Error Detection in Spoken Human-Machine Interaction[E.Krahmer, M. Swerts, M. Theune, M. Weegels] • Problem Spotting in Human-Machine Interaction[E.Krahmer, M. Swerts, M. Theune, M. Weegels] • The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates[E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Goals • Investigate the prosodic correlates of disconfirmations • Is this slightly different than before ? (i.e. now looking at any corrections? Answer: No) • Looked at prosody on “NO” as a go_on vs a go_back: • Do you want to fly from Pittsburgh ? • Shall I summarize your trip ?

Human-human • Higher pitch range, longer duration • Preceded by a longer delay • High H% boundary tone • Expected to see same behavior for disconfirmation in human-machine

Prosodic correlates • Yes, the correlations are there as expected

Perceptual analysis • Took 40 “No” from No+stuff, 20 go_on and 20 go_back (note that some features are lost this way…) • Forced choice randomized task, w/ no feedback; 25 native speakers of Dutch • Results • 17 go_on correctly identified above chance • 15 go_back correctly identified above chance; but also 1 incorrectly identified above chance.

Discussion • Q1: Blurred relationships … • Confidence annotation • Go_on / Go_back signal • Is that the same as corrections ? • Is that the most general case for responses to implicit/explicit verifications, or should we have a separate detector ? • Q2: What other features could we throw at these problems ? What are the “most juicy” ones ?

Discussion • Q3: For implicit confirms, are these different in terms of induced response behavior ? • When do you want to leave Pittsburgh ? • Travelling from Pittsburgh … when do you want to leave ? • When do you want to leave from Pittsburgh to Boston ?

Error Detection in Human-Machine Interaction