a principled approach for rejection threshold optimization

a principled approach for rejection threshold optimization Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

understanding errors and rejection • systems often misunderstand • use confidence scores • common design pattern • compare input confidence against a threshold • reject utterance if confidence is too low • may lead to false rejections

75% 50% 25% 0% 0 0.25 0.5 0.75 1 rejection threshold rejection tradeoff • misunderstandings vs. false rejections false rejections misunderstandings

0 0.25 0.5 0.75 1 rejection threshold rejection tradeoff • misunderstandings vs. false rejections • correctly vs. incorrectly transferred concepts correctly transferred concepts / turn incorrectly transferred

question given this trade-off, how can we optimize the rejection threshold in a principled fashion?

outline • current solutions • proposed approach • data • results • conclusion

current solutions • follow ASR manual [Nuance documentation] • acknowledge the tradeoff + postulate costs • “misunderstandings are X times more costly than false rejections” [Raymond et al 2004; Kawahara et al, 2000; Cuayahuitl et al, 2002] • costs are likely to differ • across domains / systems • across dialog states within a system

identify a set of variables involved in the tradeoff correctly and incorrectly transferred concepts per turn (CTC, ITC) CTC ITC proposed approach • derive costs in a principled fashion 2. choose a dialog performance metric task completion (binary, kappa) – TC; 3. build a regression model logit(TC) ← C0 + CCTC•CTC + CITC•ITC 4. optimize threshold to maximize performance th* = argmax (CCTC•CTC + CITC•ITC)

state-specific costs • costs are different in different dialog states • CTC and ITC on a per-state basis logit(TC) ← C0 + CCTCstate1•CTCstate1 + CITCstate1•ITCstate1+ CCTCstate2•CTCstate2 + CITCstate2•ITCstate2+ CCTCstate3•CTCstate3 + CITCstate3•ITCstate3+ … • optimize separate threshold for each state thstate_x* = argmax (CCTCstate_x•CTCstate_x + CITCstate_x•ITCstate_x)

outline • current solutions • proposed approach • data • results • conclusion

data • collected using RoomLine • phone-based, mixed-initiative spoken dialog system • conference room reservations • sphinx-2 • utterance-level confidence annotator [0-1] • 46 participants (first-time users) • 10 scenario-driven interactions • corpus • 449 dialog sessions • 8278 user turns • manually labeled decoded concept “correctness”

roomline states • 71 “dialog states” total • clustered into 3 classes • open-request How may I help you? • request(bool) Would you like a reservation for this room? Would you like a room with a projector? • request(non-bool) For what time would you like to reserve the room?

cost coefficients Variable Coeff p se Const -2.3442 0.0416 1.1504 CTC / open-request 0.5518 0.0619 0.2955 ITC / open-request -0.4067 0.3801 0.4634 CTC / request(bool) 3.3127 0.0010 1.0076 ITC / request(bool) -0.5959 0.6491 1.3098 CTC / request(non-bool) 2.5514 0.0017 0.8137 ITC / request(non-bool) -3.441 0.0018 1.1046 results: task success model model predicting binary task success

correctly transferred concepts per turn incorrectly transferred concepts per turn cost coefficients utility = 0.55 x CTC – 0.40 x ITC Variable Coeff p se Const -2.3442 0.0416 1.1504 CTC / open-request 0.5518 0.0619 0.2955 ITC / open-request -0.4067 0.3801 0.4634 CTC / request(bool) 3.3127 0.0010 1.0076 ITC / request(bool) -0.5959 0.6491 1.3098 CTC / request(non-bool) 2.5514 0.0017 0.8137 ITC / request(non-bool) -3.441 0.0018 1.1046 results: threshold optimization open-request 1 0.5 0 0 0.25 0.5 0.75 1

request(bool) 3 utility = 3.31 x CTC – 0.60 x ITC 2 open-request 1 correctly transferred concepts per turn incorrectly transferred concepts per turn 1 utility = 0.55 x CTC – 0.40 x ITC 0 0 0.25 0.5 0.75 1 request(non-bool) 0.5 utility = 2.55 x CTC – 3.44 x ITC 1 0 0 0.25 0.5 0.75 1 0.5 0 0 0.25 0.5 0.6 0.75 1 results: threshold optimization • utility profiles are different across the three states • task duration models lead to similar results

conclusion • principled method for optimizing rejection threshold • determine costs for various types of understanding errors • data-driven approach • can derive state-specific costs • bridge mismatches between off-the-shelf confidence annotators and domain

thank you

fit for task success model

expected changes in task success Remains to be seen …

task duration model

Model 2: Resulting fit and coefficients R^2 = 0.56 intro: data collection : rejection threshold

a principled approach for rejection threshold optimization

a principled approach for rejection threshold optimization

Presentation Transcript

Principled Negotiation

A Hybrid Optimization Approach for Automated Parameter Estimation Problems

Principled Negotiation

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach

A Unique Approach for Arterial Bandwidth Optimization

A Principled Approach to Managing Routing in Large ISP Networks

Principled

A Hybrid Optimization Approach for Global Exploration

A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

A Multi-Scale and Multi-Threshold Approach

Assessing Survey Research, a principled approach

Computational Approach for Adjudging Feasibility of Acceptable Disturbance Rejection

Cluster Threshold Optimization from TIF data

Assessing Survey Research, a principled approach An application to Internet surveys

A Hybrid Optimization Approach for Automated Parameter Estimation Problems

Towards a More Principled Compiler: Progressive Backend Compiler Optimization

A principled approach to managing the South China Sea Disputes

A Principled Approach to Nondeferred Reference-Counting Garbage Collection †

A matrix generation approach for eigenvalue optimization

A Variation-tolerant Sub-threshold Design Approach

Towards a Robust Query Optimizer: A Principled and Practical Approach

A Variation-tolerant Sub-threshold Design Approach