Reference Resolution: Approaches and Applications

Reference Resolution:Approaches and Applications Discourse and Dialogue CS 359 October 4, 2001

Agenda • Reference resolution approaches • Tree-based (Hobbs) • “Centering” - Grosz, Joshi, Weinstein; Brennan et al. 1987 • CogNIAC - (Baldwin et al 1995) • Reference resolution extensions/applications • Cross-document co-reference • Summarization

Discussion Questions • (Paraphrased) How computationally tractable or practical are these approaches? • What are the key differences/limitations of the different approaches? • What about “long distance” distance anaphora? • If you get 90% precision at 60% recall, how can you boost recall? • How WOULD one go about incorporating world knowledge into these approaches? Does any system do it successfully?

Hobbs’ Tree-based Resolution • Uses full syntactic analyses as structure • Ranking of possible antecedents based on: • Breadth-first left-to-right tree traversal • Moving backward through sentences

Example

Centering • Identify the local “center” of attention • Pronominalization focuses attention, appropriate use establishes coherence • Identify entities available for reference • Describe shifts in what discourse is about • Prefer different types for coherence

Centering: Structures • Each utterance (Un) has: • List of forward-looking centers: Cf(Un) • Entities realized/evoked in Un • Rank by likelihood of focus of future discourse • Highest ranked element: Cp(Un) • Backward looking center (focus): Cb(Un)

Centering: Transitions

Centering: Constraints and Rules • Constraints: • Exactly ONE backward -looking center • Everything in Cf(Un) realized in Un • Cb(Un): highest ranked item in Cf(Un) in Un • Rules: • If any item in Cf(Un-1) realized as pronoun in Un, Cb(Un) must be realized as pronoun • Transitions are ranked: • Continuing > Retaining > Smooth Shift > Rough Shift

Centering: Example • John saw a beautiful Acura Integra at the dealership • Cf: (John, Integra, dealership); No Cb • He showed it to Bill. • Cf:(John/he, Integra/it*, Bill); Cb: John/he • He bought it: • Cf: (John/he, Integra/it); Cb: John/he

CogNIAC • Goal: Resolve with high precision • Identify where ambiguous, use no world knowledge, simple syntactic analysis • Precision: # correct labelings/# of labelings • Recall: # correct labelings/# of anaphors • Uses simple set of ranked rules • Applied incrementally left-to-right • Designed to work on newspaper articles • Tune/rank rules

CogNIAC: Rules • Only resolve reference if unique antecedent • 1) Unique in discourse • 2) Reflexive: nearest legal in same sentence • 3) Unique in current & prior: • 4) Possessive Pro: single exact poss in prior • 5) Unique in current • 6) Unique subj/subj pronoun

CogNIAC: Example • John saw a beautiful Acura Integra in the dealership. • He showed it to Bill. • He= John : Rule 1; it -> ambiguous (Integra) • He bought it. • He=John: Rule 6; it=Integra: Rule 3

Reference Resolution: Differences • Require different levels of analysis • Different structures to capture focus • Different assumptions about: • # of foci, ambiguity of reference • Different combinations of features

Reference Resolution: Agreements • Enforce syntactic/semantic constraints • Preferences: • Recency • Grammatical Role Parallelism (ex. Hobbs) • Role ranking • Frequency of mention • Local reference resolution • Little/No world knowledge • Similar levels of effectiveness

Reference Resolution: Extensions • Cross-document co-reference • (Baldwin & Bagga 1998) • Break “the document boundary” • Question: “John Smith” in A = “John Smith” in B? • Approach: • Integrate: • Within-document co-reference • with • Vector Space Model similarity

Cross-document Co-reference • Run within-document co-reference (CAMP) • Produce chains of all terms used to refer to entity • Extract all sentences with reference to entity • Pseudo per-entity summary for each document • Use Vector Space Model (VSM) distance to compute similarity between summaries

Cross-document Co-reference • Experiments: • 197 NYT articles refering to “John Smith” • 35 different people, 24: 1 article each • With CAMP: Precision 92%; Recall 78% • Without CAMP: Precision 90%; Recall 76% • Pure Named Entity: Precision 23%; Recall 100%

Co-reference Summarization • “Extract” summary: Pull out full sentences from document to form summary of x% • Query-oriented summary • Extract sentences that cover entities in query • Link NPs via co-reference, string match, acronym lookup, etc.. • Select sentences to cover co-refer entities

Conclusions • Co-reference establishes coherence • Reference resolution depends on coherence • Variety of approaches: • Syntactic constraints, Recency, Frequency,Role • Similar effectiveness - different requirements • Co-reference can enable summarization within and across documents (and languages!)

Reference Resolution: Approaches and Applications