190 likes | 440 Views
Turn-taking. Discourse and Dialogue CS 359 November 6, 2001. Agenda. Motivation Silence in Human-Computer Dialogue Turn-taking in human-human dialogue Turn-change signals Back-channel acknowledgments Maintaining contact Exploiting to improve HCC
E N D
Turn-taking Discourse and Dialogue CS 359 November 6, 2001
Agenda • Motivation • Silence in Human-Computer Dialogue • Turn-taking in human-human dialogue • Turn-change signals • Back-channel acknowledgments • Maintaining contact • Exploiting to improve HCC • Automatic identification of disfluencies, jump-in points, and jump-ins
Turn-taking in HCI • Human turn end: • Detected by 250ms silence • System turn end: • Signaled by end of speech • Indicated by any human sound • Barge-in • Continued attention: • No signal
Gesture, Gaze & Voice • Range of gestural signals: • head (nod,shake), shoulder, hand, leg, foot movements; facial expressions; postures; artifacts • Align with syllables • Units: phonemic clause + change • Study with recorded exchanges
Yielding the Floor • Turn change signal • Offer floor to auditor/hearer • Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause • Likelihood of change increases with more cues • Negated by any gesticulation
Taking the Floor • Speaker-state signal • Indicate becoming speaker • Occurs at beginning of turns • Cues: • Shift in head direction • AND/OR • Start of gesture
Retaining the Floor • Within-turn signal • Still speaker: Look at hearer as end clause • Continuation signal • Still speaker: Look away after within-turn/back • Back-channel: • ‘mmhm’/okay/etc; nods, • sentence completion. Clarification request; restate • NOT a turn: signal attention, agreement, confusion
Segmenting Turns • Speaker alone: • Within-turn signal->end of one unit; • Continuation signal -. Beginning of next unit • Joint signal: • Speaker turn signal (end); auditor ->speaker; speaker->auditor • Within-turn + back-channel + continuation • Back-channels signal understanding • Early back-channel + continuation
Regaining Attention • Gaze & Disfluency • Disfluency: “perturbation” in speech • Silent pause, filled pause, restart • Gaze: • Conversants don’t stare at each other constantly • However, speaker expects to meet hearer’s gaze • Confirm hearer’s attention • Disfluency occurs when realize hearer NOT attending • Pause until begin gazing, or to request attention
Improving Human-Computer Turn-taking • Identifying cues to turn change and turn start • Meeting conversations: • Recorded, natural research meetings • Multi-party • Overlapping speech • Units = “Spurts” between 500ms silence
Text + Prosody • Text sequence: • Modeled as n-gram language model • Implement as HMM • Prosody: • Duration, Pitch, Pause, Energy • Decision trees: classify + probability • Integrate LM + DT
Decision Trees A X=t X=f B C Y>1 Y<=2 Y<=1 Y>2 D E F G None Sentence End Sentence End Disfluency
Interpreting Breaks • For each inter-word position: • Is it a disfluency, sentence end, or continuation? • Key features: • Pause duration, vowel duration • 62% accuracy wrt 50% chance baseline • ~90% overall • Best combines LM & DT
Jump-in Points • (Used) Possible turn changes • Points WITHIN spurt where new speaker starts • Key features: • Pause duration, low energy, pitch fall • Accuracy: 65% wrt 50% baseline • Performance depends only on preceding prosodic features
Jump-in Features • Do people speak differently when jump-in? • Differ from regular turn starts? • Examine only first words of turns • No LM • Key features: • Raised pitch, raised amplitude • Accuracy: 77% wrt 50% baseline • Prosody only
Summary • Prosodic features signal conversational moves • Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation • Jump-ins occur at locations that sound like sent. ends • Raise voice when jump in