1 / 25

Challenges in Dialogue

Explore the issues and models of discourse and dialogue in computational agents, focusing on turn-taking, collaboration, speech acts, and the management of multi-party spoken dialogue. Discover the importance of gestures, gaze, and voice in turn-taking and understand the various signals for yielding the floor, taking the floor, and retaining the floor. Learn about segmenting turns, regaining attention, and the collaborative nature of communication. Lastly, delve into the computational models and the implications of implicature and Grice's maxims in speech and dialogue acts.

blairr
Download Presentation

Challenges in Dialogue

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges in Dialogue Discourse and Dialogue CMSC 35900-1 October 27, 2006

  2. Roadmap • Issues in Dialogue • Dialogue vs General Discourse • Dialogue Acts • Modeling • Recognition and Interpretation • Dialogue Management for Computational Agents

  3. Dialogue vs General Discourse • Key contrast: Two or more speakers • Primary focus on speech • Issues in multi-party spoken dialogue • Turn-taking – who speaks next, when? • Collaboration – clarification, feedback,… • Disfluencies • Adjacency pairs, dialogue acts

  4. Turn-Taking • Multi-party discourse • Need to trade off speaker/hearer roles • Interpret reference from sequential utterances • When? • End of sentence? • No: multi-utterance turns • Silence? • No: little silence in smooth dialogue:< 250ms • When other starts speaking? • No: relatively little overlap face-to-face: ~5%

  5. Turn-taking: When • Rule-governed behavior • Possibly multiple legal turn change times • Aka transition-relevance places (TRP) • Generally at utterance boundaries • Utterance not necessarily sentence • In fact, utterance/sentence boundaries not obvious in speech • Don’t necessarily pause between sentences • Automatic utterance boundary detection • Cue words (okay, so,..); POS sequences; prosody

  6. Turn-taking: Who & How • At each TRP in each turn (Sacks 1974) • If speaker has selected A to speak, A must take floor • If speaker has selected no one to speak, anyone can • If no one else takes the turn, the speaker can • Selecting speaker A: • By explicit/implicit mention: What about it, Bob? • By gaze, function • Selecting others: questions, greetings, closing • (Traum et al., 2003)

  7. Turn-taking in HCI • Human turn end: • Detected by 250ms silence • System turn end: • Signaled by end of speech • Indicated by any human sound • Barge-in • Continued attention: • No signal

  8. Gesture, Gaze & Voice • Range of gestural signals: • head (nod,shake), shoulder, hand, leg, foot movements; facial expressions; postures; artifacts • Align with syllables • Units: phonemic clause + change • Study with recorded exchanges

  9. Yielding the Floor • Turn change signal • Offer floor to auditor/hearer • Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause • Likelihood of change increases with more cues • Negated by any gesticulation

  10. Taking the Floor • Speaker-state signal • Indicate becoming speaker • Occurs at beginning of turns • Cues: • Shift in head direction • AND/OR • Start of gesture

  11. Retaining the Floor • Within-turn signal • Still speaker: Look at hearer as end clause • Continuation signal • Still speaker: Look away after within-turn/back • Back-channel: • ‘mmhm’/okay/etc; nods, • sentence completion. Clarification request; restate • NOT a turn: signal attention, agreement, confusion

  12. Segmenting Turns • Speaker alone: • Within-turn signal->end of one unit; • Continuation signal -. Beginning of next unit • Joint signal: • Speaker turn signal (end); auditor ->speaker; speaker->auditor • Within-turn + back-channel + continuation • Back-channels signal understanding • Early back-channel + continuation

  13. Regaining Attention • Gaze & Disfluency • Disfluency: “perturbation” in speech • Silent pause, filled pause, restart • Gaze: • Conversants don’t stare at each other constantly • However, speaker expects to meet hearer’s gaze • Confirm hearer’s attention • Disfluency occurs when realize hearer NOT attending • Pause until begin gazing, or to request attention

  14. Collaborative Communication • Speaker tries to establish and add to “common ground” – “mutual belief” • Presumed a joint, collaborative activity • Make sure “mutually believe” the same thing • Hearer can acknowledge/accept/disagree • Clark & Schaeffer: Degrees of grounding • Display, Demonstrate/Reformulate, Acknowledgement, Next relevant contribution, Continued attention

  15. Computational Models • (Traum et al) revised for computation • Involves both speaker and hearer • Initiate, Continue, Acknowledge, Repair, Request Repair, etc • Common phenomena • “Back-Channel” – “uh-huh”, “okay”, etc • Allows hearer to signal continued attention, ack • WITHOUT taking the turn • Requests for repair – common in human-human • Even more common in human-computer dialogue

  16. Implicature & Grice’s Maxims • Inferences licensed by utterances • Grice’s Maxims • Quantity: Be as informative as required • “There are two classes per week” – not 1, or 5 • Quality: Be truthful – don’t lie, • Relevance: Be relevant • Manner: “Be perspicuous” • Don’t be obscure, ambiguous, prolix, or disorderly • “Flouting” maxims: Consciously violate for effect • Humor, emphasis,

  17. Speech & Dialogue Acts • Speech Acts (Austin, Searle) • “Doing things with words” • E.g. performatives: “I dub thee Sir Lancelot” • Illocutionary acts: act of asking, answering, promising, etc in saying an utterance • Include: Assertives: “I propose to..” , Directives: “Stop that”, Commissives: “I promise”, Expressives: “Thank you”, Declarations: “You’re fired”

  18. Dialogue Acts • (aka Conversational moves) • Enriched set of speech acts • Capture full range of conversational functions • Adjacency pairs: Many two-part structures • E.g. Question-Answer, Greeting-Greeting, Request-Grant, etc… • Paired for speaker-hearer dyads • Contrast with rhetorical relations in monologue

  19. DAMSL • Dialogue Act Tagging framework • Adjacency pairs+grounding+repair • Forward looking functions • Statement, info-request, commit, closing, etc • Backward looking functions • Focus on link to prior speaker utterance • Agreement, answer, accept, etc..

  20. [assert] C1: . . . I need to travel in May. [inforeq,ack] A1: And, what day in May did you want to travel? [assert,answer] C2: OK uh I need to be there for a meeting that’s from the 12th to the 15th. [inforeq,ack] A2: And you’re flying into what city? [assert,answer]C3: Seattle. [inforeq,ack] A3: And what time would you like to leave Pittsburgh? [check,hold] C4: Uh hmm I dont think theres many options for nonstop. [accept,ack] A4: Right. [assert] There’s three non-stops today. [info-req] C5: What are they? [assert,open-option] A5: The first one departs PGH at 10:00am arrives Seattle at 12:05 their time. The second flight departs PGH at 5:55pm, arrives Seattle at 8pm. And the last flight departs PGH at 8:15pm arrives Seattle at 10:28pm. [accept,ack] C6: OK Ill take the 5ish flight on the night beforeon the11th. [check,ack] A6: On the 11th? [assert,ack] OK. Departing at 5:55pm arrives Seattle at 8pm, U.S. Air flight 115. [ack] C7: OK. Tagged Dialogue

  21. Dialogue Act Recognition • Goal: Identify dialogue act tag(s) from surface form • Challenge: Surface form can be ambiguous • “Can you X?” – yes/no question, or info-request • “Flying on the 11th, at what time?” – check, statement • Requires interpretation by hearer • Strategies: Plan inference, cue recognition

  22. Plan-inference-based • Classic AI (BDI) planning framework • Model Belief, Knowledge, Desire • Formal definition with predicate calculus • Axiomatization of plans and actions as well • STRIPS-style: Preconditions, Effects, Body • Rules for plan inference • Elegant, but.. • Labor-intensive rule, KB, heuristic development • Effectively AI-complete

  23. Cue-based Interpretation • Employs sets of features to identify • Words and collocations: Please -> request • Prosody: Rising pitch -> yes/no question • Conversational structure: prior act • Example: Check: • Syntax: tag question “,right?” • Syntax + prosody: Fragment with rise • N-gram: argmax d P(d)P(W|d) • So you, sounds like, etc • Details later ….

  24. From Human to Computer • Conversational agents • Systems that (try to) participate in dialogues • Examples: Directory assistance, travel info, weather, restaurant and navigation info • Issues: • Limited understanding: ASR errors, interpretation • Computational costs: • broader coverage -> slower, less accurate

  25. Dialogue Manager Tradeoffs • Flexibility vs Simplicity/Predictability • System vs User vs Mixed Initiative • Order of dialogue interaction • Conversational “naturalness” vs Accuracy • Cost of model construction, generalization, learning, etc • Models: FST, Frame-based, HMM, BDI • Evaluation frameworks

More Related