210 likes | 398 Views
a principled approach for rejection threshold optimization. Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217. understanding errors and rejection. systems often misunderstand
E N D
a principled approach for rejection threshold optimization Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
understanding errors and rejection • systems often misunderstand • use confidence scores • common design pattern • compare input confidence against a threshold • reject utterance if confidence is too low • may lead to false rejections
75% 50% 25% 0% 0 0.25 0.5 0.75 1 rejection threshold rejection tradeoff • misunderstandings vs. false rejections false rejections misunderstandings
0 0.25 0.5 0.75 1 rejection threshold rejection tradeoff • misunderstandings vs. false rejections • correctly vs. incorrectly transferred concepts correctly transferred concepts / turn incorrectly transferred
question given this trade-off, how can we optimize the rejection threshold in a principled fashion?
outline • current solutions • proposed approach • data • results • conclusion
current solutions • follow ASR manual [Nuance documentation] • acknowledge the tradeoff + postulate costs • “misunderstandings are X times more costly than false rejections” [Raymond et al 2004; Kawahara et al, 2000; Cuayahuitl et al, 2002] • costs are likely to differ • across domains / systems • across dialog states within a system
identify a set of variables involved in the tradeoff correctly and incorrectly transferred concepts per turn (CTC, ITC) CTC ITC proposed approach • derive costs in a principled fashion 2. choose a dialog performance metric task completion (binary, kappa) – TC; 3. build a regression model logit(TC) ← C0 + CCTC•CTC + CITC•ITC 4. optimize threshold to maximize performance th* = argmax (CCTC•CTC + CITC•ITC)
state-specific costs • costs are different in different dialog states • CTC and ITC on a per-state basis logit(TC) ← C0 + CCTCstate1•CTCstate1 + CITCstate1•ITCstate1+ CCTCstate2•CTCstate2 + CITCstate2•ITCstate2+ CCTCstate3•CTCstate3 + CITCstate3•ITCstate3+ … • optimize separate threshold for each state thstate_x* = argmax (CCTCstate_x•CTCstate_x + CITCstate_x•ITCstate_x)
outline • current solutions • proposed approach • data • results • conclusion
data • collected using RoomLine • phone-based, mixed-initiative spoken dialog system • conference room reservations • sphinx-2 • utterance-level confidence annotator [0-1] • 46 participants (first-time users) • 10 scenario-driven interactions • corpus • 449 dialog sessions • 8278 user turns • manually labeled decoded concept “correctness”
roomline states • 71 “dialog states” total • clustered into 3 classes • open-request How may I help you? • request(bool) Would you like a reservation for this room? Would you like a room with a projector? • request(non-bool) For what time would you like to reserve the room?
cost coefficients Variable Coeff p se Const -2.3442 0.0416 1.1504 CTC / open-request 0.5518 0.0619 0.2955 ITC / open-request -0.4067 0.3801 0.4634 CTC / request(bool) 3.3127 0.0010 1.0076 ITC / request(bool) -0.5959 0.6491 1.3098 CTC / request(non-bool) 2.5514 0.0017 0.8137 ITC / request(non-bool) -3.441 0.0018 1.1046 results: task success model model predicting binary task success
correctly transferred concepts per turn incorrectly transferred concepts per turn cost coefficients utility = 0.55 x CTC – 0.40 x ITC Variable Coeff p se Const -2.3442 0.0416 1.1504 CTC / open-request 0.5518 0.0619 0.2955 ITC / open-request -0.4067 0.3801 0.4634 CTC / request(bool) 3.3127 0.0010 1.0076 ITC / request(bool) -0.5959 0.6491 1.3098 CTC / request(non-bool) 2.5514 0.0017 0.8137 ITC / request(non-bool) -3.441 0.0018 1.1046 results: threshold optimization open-request 1 0.5 0 0 0.25 0.5 0.75 1
request(bool) 3 utility = 3.31 x CTC – 0.60 x ITC 2 open-request 1 correctly transferred concepts per turn incorrectly transferred concepts per turn 1 utility = 0.55 x CTC – 0.40 x ITC 0 0 0.25 0.5 0.75 1 request(non-bool) 0.5 utility = 2.55 x CTC – 3.44 x ITC 1 0 0 0.25 0.5 0.75 1 0.5 0 0 0.25 0.5 0.6 0.75 1 results: threshold optimization • utility profiles are different across the three states • task duration models lead to similar results
conclusion • principled method for optimizing rejection threshold • determine costs for various types of understanding errors • data-driven approach • can derive state-specific costs • bridge mismatches between off-the-shelf confidence annotators and domain
expected changes in task success Remains to be seen …
Model 2: Resulting fit and coefficients R^2 = 0.56 intro: data collection : rejection threshold