Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories

Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories Matthew Gerber* Andrew S. Gordon+ Kenji Sagae+ *Department of Computer Science Michigan State University +Institute for Creative Technologies University of Southern California

Story-based envisionment • Envisionment tasks • Prediction: what might come next? • Explanation: how did the current state of affairs arise? • Imagination: moving beyond our own experiences • Research question • Can crowd-sourced stories support envisionment? (insert picture of the chatty cityscape)

Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work

Related work • Weblogs and social media • International Conference on Weblogs and Social Media • Topics include sentiment, topic propagation, and others • Knowledge extraction • Factoid extraction (Schubert and Tong, 2003) • Weblog factoid extraction (Gordon, 2009) • Cause identification • Aviation incident reports (Persing and Ng, 2009)

Story identification • Spinn3r (Burton et al., 2009) • Tens of millions of weblog entries • August 1st, 2008 through October 1st, 2008 • Story extraction (Gordon and Swanson, 2009) • Supervised binary classification (75% precision) • 960,098 stories • Currently supports SayAnything, an open-domain collaborative storytelling system

Story analysis • Rhetorical Structure Theory (Carlson & Marcu, 2001) • Causation • [cause Packages often get buried in the load] [result and are delivered late.] • Temporal precedence • [before Three months after she arrived in L.A.] [after she spent $120 she didn’t have.] • Many other relations that were not used

Story analysis • RST parsing (Sagae, 2009) • Joint syntax/discourse dependency parsing • Linear runtime • 44.5% F1 on RST test section • RST for envisionment • Retrained parser on causal and temporal relations • Extracted 2.2 million causal relations • Extracted 220,000 precedes relations • Indexed two discourse units per relation using Lucene

Story-based inference for envisionment • Input • Free-text state/event description ( ) • Inference type (causal, temporal) • Inference direction (forward, backward) • Example inferences • : John fell off the ski lift. • Forward causal: John broke his foot. • Backward causal: John drank too much beer at the lodge.

: ranked list of discourse units • … • … • … Lucene • : ranked list of inference results • … • … • … Story-based inference for envisionment • Baseline inference model TF-IDF query

Re-ranked Original Centroid of Story-based inference for envisionment • Re-ranking the inference results ( ) • Centroid similarity • often contains informative redundancies • Compute centroid of and re-rank • … <1,0,1> • … <0,2,1> <1.33,1,0.66> • … <3,1,0> 1. … <3,1,0> (0.88) 2. … <1,0,1> (0.79) 3. … <0,2,1> (0.66)

Story-based inference for envisionment • Re-ranking the inference results ( ) • Centroid similarity • Description score scaling • should also be sensitive to the original query score • Centroid similarity * original query score • Log-length scaling • Favor longer inference results • Centroid similarity * log(length of inference) • Combined description score and log-length scaling • Centroid similarity * original query score * log(length of inference)

Evaluation setting • 256 sentences (5 documents) • Automatically generated forward/backward causal/temporal inferences for each sentence • Kept best scoring inference for each sentence • Manually evaluated inference results • Inference must increase local coherence • Inference must be globally valid • I didn’t even need a jacket (until I got there). Incorrect: it was a warm day Forward temporal inference

Evaluation results • Tradeoff between inference rate and accuracy

Conclusions and future work • This is a difficult task and much work remains • Story analysis • Genre adaptation (newswire weblog) • Penn Discourse TreeBank (Prasad et al., 2008) • 3.5-fold increase in training data for causal/temporal relations • Extraction of causal/temporal relations from traditional SRL analyses • Story-based inference • More sophisticated selection of inference/direction • Incorporation of sentence context into model • Exploitation of other redundancies in the story corpus

Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories

Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories

Presentation Transcript

Discourse Annotation: Discourse Connectives and Discourse Relations

Automatic recognition of discourse relations

Using Commonsense Reasoning in Video Games

The New Age of Commonsense Reasoning

Using Corpus Tools in Discourse Analysis

Identifying Personal Stories in Millions of Weblog Entries

Adapting Open Information Extraction to Domain-Specific Relations

Automatically Evaluating Text Coherence Using Discourse Relations

Answering WHY questions in Closed Domain from a Discourse Model

Compiling a Spoken Chinese Corpus of Situated Discourse

Commonsense Reasoning and Science

MAKEBELIEVE: Using Commonsense Knowledge to Generate Stories

Corpus-assisted discourse analysis

C onceptNet - a pratical commonsense reasoning tool-kit

Commonsense Physical Reasoning: Boxes and Pitchers

NLog-like Inference and Commonsense Reasoning

RST Discourse Corpus

Corpus-assisted discourse analysis

Answering WHY questions in Closed Domain from a Discourse Model

Commonsense Reasoning and Science

Commonsense reasoning

Using Commonsense Reasoning in Video Games