200 likes | 510 Views
Using Syntax to Disambiguate Explicit Discourse Connectives in Text. Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen. Discourse connectives. Words or phrases that explicitly signal the presence of a discourse relation such as on ce since
E N D
Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen
Discourse connectives • Words or phrases that explicitly signal the presence of a discourse relation • such as • once • since • on the contrary • Implicit relations • a discourse connective is absent and inferred by the reader • hard to identify automatically • Explicit relations are much easier to predict, but…
Two types of ambiguity • Discourse or non-discourse usage • For example, ”once” • a temporal discourse connective • a simply a word meaning “formerly” • Some connectives are ambiguous in terms of the relation they mark • For example, ” since” • serve as temporal connective • serve as causal connective
Goal • Explore the predictive power of syntactic features for both disambiguation tasks
Corpus and features • Corpus: Penn Discourse Treebank (PDTB) • Each discourse connective is assigned a sense from a three-level hierarchy of senses • Annotates 40,600 discourse relations (the largest public resource ) • 18,459 Explicit Relations • of 100 explicit discourse connectives • 16,053 Implicit Relations • Other relations • Annotators were allowed to provide two senses for a given connective
Relation categories of discourse connective in PDTB • This work consider only the top level categories • general enough to be annotated with high inter-annotator agreement • Expansion擴展 (遞進/解證) • one clause is elaborating information in the other • Comparison對比 (並列) • information in the two clauses is compared or contrasted • Contingency情況 (因果/條件) • one clause expresses the cause of the other • Temporal循序(承接) • information in two clauses are related because of their timing
Syntactic features • Syntax has not been used for discourse vs. non-discourse disambiguation • Syntax extensively used for dividing sentences into elementary discourse units • Idea: Discourse connectives appear in specific syntactic contexts • Four feature categories: • Self Category • Parent Category • Left Sibling Category • Right Sibling Category Parent Left Sibling Self Right Sibling
Self Category • The highest node in the tree which dominates the words in the connective • For single word connectives • this might correspond to the POS tag of the word • For multi-word connectives • Example cue phrase “in addition” • Parsed as (PP (IN In) (NP (NN addition) )) • Preposition + Noun • the Self Category of the phrase is prepositional phrase
Parent Category • The category of the immediate parent of the Self Category • Example: My favorite colors are blue and green • when “and” doesn’t has a discourse function • the parent of “and” would be an NP (“blue and green”)
Left Sibling Category • The syntactic category of the sibling immediately to the left of the Self Category • If the left sibling does not exist, this features takes the value “NONE” • Self Category has a discourse function • while in example above, the left sibling of “and” is “NP” • so doesn’t has a discourse function
Right Sibling Category • The syntactic category of the sibling immediately to the right of the Self Category • English is a right-branching language • the right sibling is often the dependent of the potential discourse connective • If the connective string has a discourse function • this dependent will often be a clause (SBAR) • Example: • “After I went to the store, I went home” • “After May, I will go on vacation”
More features about the right sibling • Example: • NASA won’t attempt a rescue; instead, it will try to predict whether any of the rubble will smash to the ground and where. • Although the syntactic category of “where” is SBAR, “and” doesn’t has a discourse function • So include two additional features about the contents of the right sibling • Right Sibling Contains a VP • Right Sibling Contains a Trace • This example is a wh-trace
Discourse vs. non-discourse usage • only 11 PDTB connectives appear as a discourse connective more than 90% of the time • although, in turn, afterward, consequently, additionally, alternatively, whereas, on the contrary, if and when, lest, and on the one hand...on the other hand • while “or” only serves a discourse function 2.8% of the times it appears
Training and testing • Positive examples: • explicit discourse connectives annotated in the PDTB • Negative examples: • same strings in the PDTB texts that were not annotated as explicit connectives • report results using a maximum entropy classifier • 2 sections (0 and 1) of the PDTB were used for development of the features • 21 sections (2-22) used for ten-fold cross-validation • Baseline: the string of the connective • f-score=75.33% Accuracy=85.86%
Combinations of features • Different connectives have different syntactic contexts • pair-wise interaction features • For example: connective=also-RightSibling=SBAR • Adding interaction terms between pairs of syntactic features
Sense classification • a few connectives are quite ambiguous • since : indicates Temporal or Contingency • Contingency and Temporal are the senses most often annotated together. • do classification between the four senses for each explicit relation • using a Naive Bayes classifier • The connectives most often doubly annotated are • when • and • as
Results • The human inter-annotator agreement on the top level sense class was also 94% • suggesting further improvements may not be possible
Error Analysis • Temporal relations are the least frequent of the four senses(19% of the explicit relations) • But more than half of the errors involve the Temporal class • most commonly confused pairing was Contingency relations > Temporal relations • making up 29% of errors
Conclusion • Using a few syntactic features leads to state-of-the-art accuracy for discourse vs. non-discourse usage classification • Syntactic features also helps sense class identification • already attained results at the level of human annotator agreement