180 likes | 317 Views
Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories. Matthew Gerber*. Andrew S. Gordon +. Kenji Sagae +. *Department of Computer Science Michigan State University. + Institute for Creative Technologies University of Southern California.
E N D
Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories Matthew Gerber* Andrew S. Gordon+ Kenji Sagae+ *Department of Computer Science Michigan State University +Institute for Creative Technologies University of Southern California
Story-based envisionment • Envisionment tasks • Prediction: what might come next? • Explanation: how did the current state of affairs arise? • Imagination: moving beyond our own experiences • Research question • Can crowd-sourced stories support envisionment? (insert picture of the chatty cityscape)
Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work
Related work • Weblogs and social media • International Conference on Weblogs and Social Media • Topics include sentiment, topic propagation, and others • Knowledge extraction • Factoid extraction (Schubert and Tong, 2003) • Weblog factoid extraction (Gordon, 2009) • Cause identification • Aviation incident reports (Persing and Ng, 2009)
Story identification • Spinn3r (Burton et al., 2009) • Tens of millions of weblog entries • August 1st, 2008 through October 1st, 2008 • Story extraction (Gordon and Swanson, 2009) • Supervised binary classification (75% precision) • 960,098 stories • Currently supports SayAnything, an open-domain collaborative storytelling system
Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work
Story analysis • Rhetorical Structure Theory (Carlson & Marcu, 2001) • Causation • [cause Packages often get buried in the load] [result and are delivered late.] • Temporal precedence • [before Three months after she arrived in L.A.] [after she spent $120 she didn’t have.] • Many other relations that were not used
Story analysis • RST parsing (Sagae, 2009) • Joint syntax/discourse dependency parsing • Linear runtime • 44.5% F1 on RST test section • RST for envisionment • Retrained parser on causal and temporal relations • Extracted 2.2 million causal relations • Extracted 220,000 precedes relations • Indexed two discourse units per relation using Lucene
Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work
Story-based inference for envisionment • Input • Free-text state/event description ( ) • Inference type (causal, temporal) • Inference direction (forward, backward) • Example inferences • : John fell off the ski lift. • Forward causal: John broke his foot. • Backward causal: John drank too much beer at the lodge.
: ranked list of discourse units • … • … • … Lucene • : ranked list of inference results • … • … • … Story-based inference for envisionment • Baseline inference model TF-IDF query
Re-ranked Original Centroid of Story-based inference for envisionment • Re-ranking the inference results ( ) • Centroid similarity • often contains informative redundancies • Compute centroid of and re-rank • … <1,0,1> • … <0,2,1> <1.33,1,0.66> • … <3,1,0> 1. … <3,1,0> (0.88) 2. … <1,0,1> (0.79) 3. … <0,2,1> (0.66)
Story-based inference for envisionment • Re-ranking the inference results ( ) • Centroid similarity • Description score scaling • should also be sensitive to the original query score • Centroid similarity * original query score • Log-length scaling • Favor longer inference results • Centroid similarity * log(length of inference) • Combined description score and log-length scaling • Centroid similarity * original query score * log(length of inference)
Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work
Evaluation setting • 256 sentences (5 documents) • Automatically generated forward/backward causal/temporal inferences for each sentence • Kept best scoring inference for each sentence • Manually evaluated inference results • Inference must increase local coherence • Inference must be globally valid • I didn’t even need a jacket (until I got there). Incorrect: it was a warm day Forward temporal inference
Evaluation results • Tradeoff between inference rate and accuracy
Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work
Conclusions and future work • This is a difficult task and much work remains • Story analysis • Genre adaptation (newswire weblog) • Penn Discourse TreeBank (Prasad et al., 2008) • 3.5-fold increase in training data for causal/temporal relations • Extraction of causal/temporal relations from traditional SRL analyses • Story-based inference • More sophisticated selection of inference/direction • Incorporation of sentence context into model • Exploitation of other redundancies in the story corpus