380 likes | 410 Views
Dialogue Acts. Julia Hirschberg LSA07 353. Today. Recognizing structural information: Dialogue Acts vs. Discourse Structure Speech Acts Dialogue Acts Coding schemes (DAMSL) Practical goals Identifying DAs Direct and indirect DAs: experimental results
E N D
Dialogue Acts Julia Hirschberg LSA07 353
Today • Recognizing structural information: Dialogue Acts vs. Discourse Structure • Speech Acts Dialogue Acts • Coding schemes (DAMSL) • Practical goals • Identifying DAs • Direct and indirect DAs: experimental results • Corpus studies of DA disambiguation • Automatic DA identification • More corpus studies
Speech Acts • Wittgenstein ’53, Austin ’62 and Searle ’75 • Contributions to dialogue are actions performed by speakers: • I promise to make you very very sorry for that. • Performative verbs • Locutionary act: the act of conveying the ‘meaning’ of the sentence uttered (e.g. committing the Speaker to making the hearer sorry) • Ilocutionary act: the act associated with the verb uttered (e.g. promising) • Perlocutionary act: the act of producing an effect on the Hearer (e.g. threatening)
Searle’s Classification Scheme • Assertives: commit S to the truth of X (e.g. The world is flat) • Directives: attempt by S to get H to do X (e.g. Open the window please) • Commissives: commit S to do X (e.g. I’ll do it tomorrow) • Expressives: S’s description of his/her own feelings about X (e.g. I’m sorry I screamed) • Declarations: S brings about a change in the world by virtue of uttering X (e.g. I divorce you, I resign)
Dialogue Acts • Roughly correspond to Illocutionary acts • Motivation: Modeling Spoken Dialogue • Many coding schemes (e.g. DAMSL) • Many-to-many mapping between DAs and words • Agreement DA can realized by Okay, Um, Right, Yeah, … • But each of these can express multiple DAs, e.g. S: You should take the 10pm flight. U: Okay …that sounds perfect. …but I’d prefer an earlier flight. …(I’m listening)
A Possible Coding Scheme for ‘ok’ • Ritualistic? • Closing • You're welcome • Other • No • 3rd-Turn-Receipt? • Yes • No • If Ritualistic==No, code all of these as well: • Task Management: • I'm done • I'm not done yet • None
Topic Management: • Starting new topic • Finished old topic • Pivot: finishing and starting • Turn Management: • Still your turn (=traditional backchannel) • Still my turn (=stalling for time) • I'm done, it is now your turn • None • Belief Management: • I accept your proposition • I entertain your proposition • I reject your proposition • Do you accept my proposition? (=ynq) • None
Practical Goals • In Spoken Dialogue Systems • Disambiguate current DA • Represent user input correctly • Respond appropriately • Predict next DA • Switch Language Models for ASR • Switch states in semantic processing • Produce DA for next system turn appropriately
Disambiguating Ambiguous DAs Intonationally • Modal (Can/would/would..willing) questions • Can you move the piano? • Would you move the piano? • Would you be willing to move the piano? • Nickerson & Chu-Carroll ’99: Can info-requests be disambiguated reliably from action-requests? • By prosodic information? • Role of politeness
Production Studies • Design • Subjects read ambiguous questions in disambiguating contexts • Control for given/new and contrastiveness • Polite/neutral/impolite readings • ToBI-style labeling • Problems: • Cells imbalanced; little data • No pretesting • No distractors • Same speaker reads both contexts • No perception checks
Results • Indirect requests (e.g. for action) • If L%, more likely (73%) to be indirect • If H%,46% were indirect: differences in height of boundary tone? • Politeness: can differs in impolite (higher rise) vs. neutral cases • Speaker variability • Some production differences • Limited utility in production of indirect DAs • Beware too steep a rise
Corpus Studies: Jurafsky et al ‘98 • Can we distinguish different DA functions for affirmative words • Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… • Functional categories to distinguish • Continuers: Mhmm (not taking floor) • Assessments: Mhmm (tasty) • Agreements: Mhmm (I agree) • Yes answers: Mhmm (That’s right) • Incipient speakership: Mhmm (taking floor)
Questions • Are these terms important cues to dialogue structure? • Does prosodic variation help to disambiguate them? • Is there any difference in syntactic realization of certain DAs, compared to others?
Corpus • SwitchBoard telephone conversation corpus • Hand segmented and labeled with DA information (initially from text) using the SWBD-DAMSL dialogue tagset • ~60 labels that could be combined in different dimensions • 84% inter-labeler agreement on tags • Tagset reduced to 42 • 7 CU-Boulder linguistics grad students labeling switchboard conversations of human-to-human interaction
Relabeling from speech only 2% changed labels (114/5757) • 43/987 continuers --> agreements • Why? • Shorter duration, lower F0, lower energy, longer preceding pause • DAs analyzed for • Lexical realization • F0 and intensity features • Syntactic patterns
Results: Lexical Differences • Agreements • yeah (36%), right (11%),... • Continuer • uhuh (45%), yeah (27%),… • Incipient speaker • yeah (59%), uhuh (17%), right (7%),… • Yes-answer • yeah (56%), yes (17%), uhuh (14%),...
Prosodic and Lexico/Syntactic Cues • Over all DA’s, duration best differentiator • Highly correlated with DA length in words • Assessments: • Pro Term + Copula + (Intensifier) + Assessment Adjective • That’s X (good, great, fine,…)
Observations • Yeah (and variations) ambiguous • agreement at 36% • incipient speaker at 59% • Yes-answer at 86% • Uh-huh (with its variations): • a continuer at 45% (vs. yeah at 27%) • Continuers (compared to agreements) are: • shorter in duration • less intonationally `marked’ • Preceded by longer pauses
Hypothesis • Prosodic information may be particularly helpful in distinguishing DAs with less lexical content
Automatic DA Detection • Rosset & Lamel ’04: Can we detect DAs automatically w/ minimal reliance on lexical content? • Lexicons are domain-dependent • ASR output is errorful • Corpora (3912 utts total) • Agent/client dialogues in a French bank call center, in a French web-based stock exchange customer service center, in an English bank call center
DA tags (44) similar to DAMSL • Conventional (openings, closings) • Information level (items related to the semantic content of the task) • Forward Looking Function: • statement (e.g. assert, commit, explanation) • infl on Hearer (e.g. confirmation, offer, request) • Backward Looking Function: • Agreement (e.g. accept, reject) • Understanding (e.g. backchannel, correction) • Communicative Status (e.g. self-talk, change-mind) • NB: each utt could receive a tag for each class, so utts represented as vectors • But…only 197 combinations observed
Method: Memory-based learning (TIMBL) • Uses all examples for classification • Useful for sparse data • Features • Speaker identity • First 2 words of each turn • # utts in turn • Previously proposed DA tags for utts in turn • Results • With true utt boundaries: • ~83% accuracy on test data from same domain • ~75% accuracy on test data from different domain
On automatically identified utt units: 3.3% ins, 6.6% del, 13.5% sub • Which DAs are easiest/hardest to detect?
Conclusions • Strong ‘grammar’ of DAs in Spoken Dialogue systems • A few initial words perform as well as more
Phonetic, Prosodic, and Lexical Context Cues to DA Disambiguation • Hypothesis: Prosodic information may be important for disambiguating shorter DAs • Observation: ASR errors suggest it would be useful to limit the role of lexical content in DA disambiguation as much as possible …and that this is feasible • Experiment: • Can people distinguish one (short) DA from another purely from phonetic/acoustic/prosodic cues? • Are they better with lexical context?
The Columbia Games CorpusCollection • 12 spontaneous task-oriented dyadic conversations in Standard American English. • 2 subjects playing a computer game, no eye contact. Follower: Describer:
The Columbia Games CorpusAffirmative Cue Words Cue Words • alright • gotcha • huh • mm-hm • okay • right • uh-huh • yeah • yep • yes • yup Functions • Acknowledgment / Agreement • Backchannel • Cue beginning discourse segment • Cue ending discourse segment • Check with the interlocutor • Stall / Filler • Back from a task • Literal modifier • Pivot beginning • Pivot ending • count • the 4565 • of 1534 • okay 1151 • and 886 • like 753 • …
Cue beginning discourse segment • Backchannel • Acknowledgment / Agreement • okay Perception StudySelection of Materials Speaker 1: yeah um there's like there's some space there's Speaker 2:okay I think I got it Speaker 1: but it's gonna be below the onion Speaker 2:okay Speaker 1: okay alright I'll try it okay Speaker 2:okay the owl is blinking
speakers okay contextualized ‘okay’ Perception StudyExperiment Design • 54 instances of ‘okay’ (18 for each function). • 2 tokens for each ‘okay’: • Isolated condition: Only the word ‘okay’. • Contextualized condition: 2 full speaker turns: • The turn containing the target ‘okay’; and • The previous turn by the other speaker.
Perception StudyExperiment Design • Two conditions: • Part 1: 54 isolated tokens • Part 2: 54 contextualized tokens • Subjects asked to classify each token of ‘okay’ as: • Acknowledgment / Agreement, or • Backchannel, or • Cue beginning discourse segment.
Perception StudyDefinitions Given to the Subjects • Acknowledge/Agreement: • The function of okay that indicates “I believe what you said” and/or “I agree with what you say”. • Backchannel: • The function of okay in response to another speaker's utterance that indicates only “I’m still here” or “I hear you and please continue”. • Cue beginning discourse segment • The function of okay that marks a new segment of a discourse or a new topic. This use of okay could be replaced by now.
Perception StudySubjects and Procedure • Subjects: • 20 paid subjects (10 female, 10 male). • Ages between 20 and 60. • Native speakers of English. • No hearing problems. • GUI on a laboratory workstation with headphones.
ResultsInter-Subject Agreement • Kappa measure of agreement with respect to chance (Fleiss ’71)
ResultsCues to Interpretation • Phonetic transcription of okay: • Isolated Condition Strong correlation for realization of initial vowel Backchannel Ack/Agree, Cue Beginning • Contextualized Condition No strong correlations found for phonetic variants.
ResultsCues to Interpretation S1 = Utterer of the target ‘okay’. S2 = The other speaker.
Conclusions • Agreement: • Availability of context improves inter-subject agreement. • Cue beginnings easier to disambiguate than the other two functions. • Cues to interpretation: • Contextual features override word features • Exception: Final pitch slope of okay in both conditions. • Guide to generation…
Summary: Dialogue Act Modeling for SDS • DA identification • Looks potentially feasible, even when transcription is errorful • Prosodic and lexical cues useful • DA generation • Descriptive results may be more useful for generation than for recognition, ironically • Choice of DA realization, lexical and prosodic
Next Class • J&M 22.5 • Hirschberg et al ’04 • Goldberg et al ’03 • Krahmer et al ‘01