270 likes | 401 Views
Dialogue Structure and Pronoun Resolution. Joel Tetreault and James Allen University of Rochester Department of Computer Science DAARC September 23, 2004. WELCOME TO DAARC!!!. Reference in Spoken Dialogue. Resolving anaphoric expressions correctly is critical in task-oriented domains
E N D
Dialogue Structure and Pronoun Resolution Joel Tetreault and James Allen University of Rochester Department of Computer Science DAARC September 23, 2004
Reference in Spoken Dialogue • Resolving anaphoric expressions correctly is critical in task-oriented domains • Makes conversation easier for humans • Reference resolution module provides feedback to other components in system • Ie. Incremental Parsing, Interpretation Module • Investigate how to improve RRM: • Discourse Structure could be effective in reducing search space of antecedents and improving accuracy (Grosz and Sidner, 1986) • Paucity of empirical work: Byron and Stent (1998), Eckert and Strube (2001), Byron (2002)
Goal • To evaluate whether shallow approaches to dialogue structure can improve a reference resolution algorithm (LRC used as baseline model to augment) • Investigated two models: • Eckert &Strube (manual and automatic versions) • “Literal QUD” model (manual)
Outline • Background • Dialogue Act synchronization (Eckert and Strube model) • QUD (Craige Roberts) • Monroe Corpus • Algorithm • Results • 3rd person pronoun evaluation • Dialogue Structure • Summary
Past approaches in structure and reference • Veins: the nuclei of RST trees are the most salient discourse units, the entities in these units are this more salient than others • Tetreault (2003): Penn Treebank subset annotated with RST. Used G&S approximations to try to improve on LRC baseline. • Result: performed the same as baseline • Veins: decreased performance slightly • Problem: fine-grained approaches (RST) are difficult to annotate reliably and do in real-time. • Perhaps shallow approaches can work?
literal QUD • Questions Under Discussion (Craige Roberts, Jonathan Ginzburg) – “what are we talking about?”: topics create discourse segments • Literally: questions or modals can be viewed as creating a discourse segment • Result – questions provide a shallow discourse structuring, and that maybe enough to improve performance, especially in a task-oriented domain • Entities in QUD main segment can be viewed as the topic • Segment closed when question is answered (use ack sequences, change in entities used) • only entities from answer and entities in question are accessible • Can be used in TRIPS to reduce search space of entities – set context size
QUD Annotation Scheme • Annotate: • Start utterance • End utterance • Type (aside, repeated question, unanswered, open-ended, clarification) • Kappa (compared with reconciled data):
Example - QUD utt06 U: Where is it? utt07 U: Just a second utt08 U: I can't find the Rochester airport utt09 S: It's -------------------------------------------------------- utt10 U: I think I have a disability with maps utt11 U: Have I ever told you that before utt12 S: It's located on brooks avenue utt13 U: Oh thank you utt14 S: Do you see it? utt15 U: Yes (QUD-entry :start utt06 :end utt13 :type clarification) (QUD-entry :start utt10 :end utt11 :type aside)
Example - QUD (utt10-11 processed) utt06 U: Where is it? utt07 U: Just a second utt08 U: I can't find the Rochester airport utt09 S: It's [utt10,11 removed] -------------------------------------------------------- utt12 S: It's located on brooks avenue utt13 U: Oh thank you utt14 S: Do you see it? utt15 U: Yes (QUD-entry :start utt06 :end utt13 :type clarification) (QUD-entry :start utt10 :end utt11 :type aside)
Example - QUD (s13 processed) [utt06-13 collapsed: {the Rochester airport, brooks avenue}] -------------------------------------------------------- utt14 S: Do you see it? utt15 U: Yes (QUD-entry :start utt06 :end utt13 :type clarification)
QUD Issues • Issue 1: easy to detect Q’s (use Speech-Act information), but how do you know Q is answered? • Cue words, multiple acknowledgements, changes in entities discussed provide strong clues that question is finishing, but general questions such as “how are we going to do this?” can be ambiguous • Issue 2: what is more salient to a QUD pronoun – the QUD topic or a more recent entity?
Dialogue Act Segmentation • E&S: model to resolve all types of pronouns (3rd person and abstract) in spoken dialogue • Intuition: grounding is very important in spoken dialogue • Utterances that are not acknowledged by the listener may not be in common ground and thus not accessible to pronominal reference
Dialogue Act Segmentation • Each utterance marked as • (I): contains content (initiation), question • (A): acknowledgment • (C): combination of the above • (N): none of the above • Basic algorithm: utterances not ack’d or not in a string of I’s are removed from the discourse before next sentence is processed • Evaluation showed improvement for pronouns referring to abstract entities, and strong annotator reliability • Pronoun performance? Unclear, no comparison of measure without using DA model
Example – DA model (I) (N) (I) (N) (I) (I) (I) (A) (I) (A) utt06 U: Where is it? utt07 U: Just a second utt08 U: I can't find the Rochester airport utt09 S: It's utt10 U: I think I have a disability with maps (removed) utt11 U: Have I ever told you that before utt12 S: It's located on brooks avenue utt13 U: Oh thank you utt14 S: Do you see it? utt15 U: Yes
Parsing Monroe Domain • Domain: Monroe Corpus of 20 transcriptions (Stent, 2001) of human subjects collaborating on Emergency Rescue 911 tasks • Each dialogue was at least 10 minutes long, and most were over 300 utterances long • Work presented here focuses on 5 of the dialogues (1756 utterances) (278 3rd person pronouns) • Goals: develop a corpus of sentences parsed with rich syntactic, semantic, discourse information to • Able to parse 5 dialogue sub-corpus with 84% accuracy • More details see ACL Discourse Annotation ‘04
TRIPS Parser • Broad-coverage, deep parser • Uses bottom-up algorithm with CFG and domain independent ontology combined with a domain model • Flat, unscoped LF with events and labeled semantic roles based on FrameNet • Semantic information for noun phrases based on EuroWordNet
Parser information for Reference • Rich parser output is helpful for discourse annotation and reference resolution: • Referring expressions identified (pronoun, NP, impros) • Verb roles and temporal information (tense, aspect) identified • Noun phrases have semantic information associated with them • Speech act information (question, acknowledgment) • Discourse markers (so, but) • Semi-automatic annotation increases reliability
Semantics Example: “an ambulance” • (TERM :VAR V213818 :LF (A V213818 (:* LF::LAND-VEHICLE W::AMBULANCE) :INPUT (AN AMBULANCE)) :SEM ($ F::PHYS-OBJ (SPATIAL-ABSTRACTION SPATIAL-POINT) (GROUP -) (MOBILITY LAND-MOVABLE) (FORM ENCLOSURE) (ORIGIN ARTIFACT) (OBJECT-FUNCTION VEHICLE) (INTENTIONAL -) (INFORMATION -) (CONTAINER (OR + -)) (TRAJECTORY -)))
Reference Annotation • Annotated dialogues for reference w/undergraduate researchers (created a Java Tool: PronounTool) • Markables determined by LF terms • Identification numbers determined by :VAR field of LF term • Used stand-off file to encode what each pronoun refers to (refers-to) and the relation between pronoun and antecedent (relation) • Post-processing phase assigns an unique identification number to coreference chains • Also annotated coreference between definite noun phrases
Reference Annotation • Used slightly modified MATE scheme: pronouns divided into the following types: • IDENTITY (Coreference) (278) • Includes set constructions (6) • FUNCTIONAL (20) • PROPOSITON/D.DEXEIS (41) • ACTION/EVENT (22) • INDEXICAL (417) • EXPLETIVE (97) • DIFFICULT (5)
LRC Algorithm • LRC: modified centering algorithm (Tetreault ’01) that does not use Cb or transitions, but keeps a Cf-list (history) for each utterance • While processing utterance’s entities (left to right) do: Push entity onto Cf-list-new, for a pronoun p, attempt to resolve: • Search through Cf-list-new (l-to-r) taking the first candidate that meets gender, agreement, and binding and semantic feature constraints. • If none found, search past utterance’s Cf-lists starting from previous utterance to beginning of discourse • When p is resolved, push pronoun with semantic features from antecedent on to Cf-list-new • More details see SemDial ‘04
LRC Algorithm with Structure Info • Augmented algorithm with extensions to handle QUD and E&S input • For QUD, at the start and end of processing an utterance, QUD’s are started (pushed on stack) or ended (entities are collapsed), so Cf-list history changes • For E&S, each utterance is assigned a DA code and then removed or kept depending on the next utterance (if it is an acknowledgement, or a series of I’s)
Error Analysis • Though QUD and +sem baseline performed the same (89 errors), they each got 3 pronouns right the other did not • Baseline: • 3 collapsing nodes removes correct antecedent • QUD: • 2 right associated with blocking off aside • 1 associated with collapsing (intervening nodes blocked) • 15 pronouns, both got wrong, but made different predictions • Remaining 71, both made same error
Issues • Structuring methods are probably more trouble than they are worth with the corpora available right now • Also only affect a few pronouns • Segment ends are least reliable • What constitutes an end? • 3 errors show either boundaries are marked incorrectly if pronouns are accessing elements in a “closed” DS • Or perhaps collapsing routine is too harsh • Small corpus size • Hard to draw definite conclusions given only 3 criss-crossed errors • need more data for statistical evaluations
Issues • E&S Model has advantage over QUD of being easiest to automate, but fares worse since it takes into account a small window of utterances (extremely shallow) • QUD model can be semi-automated (detecting question starts is easy) but detecting ends and type are harder • QUD could definitely be improved by taking into account plan initiations and suggestions, instead of limiting to questions only, but tradeoff is reliability