1 / 17

Turn-taking

Turn-taking. Discourse and Dialogue CS 359 November 6, 2001. Agenda. Motivation Silence in Human-Computer Dialogue Turn-taking in human-human dialogue Turn-change signals Back-channel acknowledgments Maintaining contact Exploiting to improve HCC

Download Presentation

Turn-taking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Turn-taking Discourse and Dialogue CS 359 November 6, 2001

  2. Agenda • Motivation • Silence in Human-Computer Dialogue • Turn-taking in human-human dialogue • Turn-change signals • Back-channel acknowledgments • Maintaining contact • Exploiting to improve HCC • Automatic identification of disfluencies, jump-in points, and jump-ins

  3. Turn-taking in HCI • Human turn end: • Detected by 250ms silence • System turn end: • Signaled by end of speech • Indicated by any human sound • Barge-in • Continued attention: • No signal

  4. Missed turn example

  5. Gesture, Gaze & Voice • Range of gestural signals: • head (nod,shake), shoulder, hand, leg, foot movements; facial expressions; postures; artifacts • Align with syllables • Units: phonemic clause + change • Study with recorded exchanges

  6. Yielding the Floor • Turn change signal • Offer floor to auditor/hearer • Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause • Likelihood of change increases with more cues • Negated by any gesticulation

  7. Taking the Floor • Speaker-state signal • Indicate becoming speaker • Occurs at beginning of turns • Cues: • Shift in head direction • AND/OR • Start of gesture

  8. Retaining the Floor • Within-turn signal • Still speaker: Look at hearer as end clause • Continuation signal • Still speaker: Look away after within-turn/back • Back-channel: • ‘mmhm’/okay/etc; nods, • sentence completion. Clarification request; restate • NOT a turn: signal attention, agreement, confusion

  9. Segmenting Turns • Speaker alone: • Within-turn signal->end of one unit; • Continuation signal -. Beginning of next unit • Joint signal: • Speaker turn signal (end); auditor ->speaker; speaker->auditor • Within-turn + back-channel + continuation • Back-channels signal understanding • Early back-channel + continuation

  10. Regaining Attention • Gaze & Disfluency • Disfluency: “perturbation” in speech • Silent pause, filled pause, restart • Gaze: • Conversants don’t stare at each other constantly • However, speaker expects to meet hearer’s gaze • Confirm hearer’s attention • Disfluency occurs when realize hearer NOT attending • Pause until begin gazing, or to request attention

  11. Improving Human-Computer Turn-taking • Identifying cues to turn change and turn start • Meeting conversations: • Recorded, natural research meetings • Multi-party • Overlapping speech • Units = “Spurts” between 500ms silence

  12. Text + Prosody • Text sequence: • Modeled as n-gram language model • Implement as HMM • Prosody: • Duration, Pitch, Pause, Energy • Decision trees: classify + probability • Integrate LM + DT

  13. Decision Trees A X=t X=f B C Y>1 Y<=2 Y<=1 Y>2 D E F G None Sentence End Sentence End Disfluency

  14. Interpreting Breaks • For each inter-word position: • Is it a disfluency, sentence end, or continuation? • Key features: • Pause duration, vowel duration • 62% accuracy wrt 50% chance baseline • ~90% overall • Best combines LM & DT

  15. Jump-in Points • (Used) Possible turn changes • Points WITHIN spurt where new speaker starts • Key features: • Pause duration, low energy, pitch fall • Accuracy: 65% wrt 50% baseline • Performance depends only on preceding prosodic features

  16. Jump-in Features • Do people speak differently when jump-in? • Differ from regular turn starts? • Examine only first words of turns • No LM • Key features: • Raised pitch, raised amplitude • Accuracy: 77% wrt 50% baseline • Prosody only

  17. Summary • Prosodic features signal conversational moves • Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation • Jump-ins occur at locations that sound like sent. ends • Raise voice when jump in

More Related