1 / 18

Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories

Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories. Matthew Gerber*. Andrew S. Gordon +. Kenji Sagae +. *Department of Computer Science Michigan State University. + Institute for Creative Technologies University of Southern California.

mimi
Download Presentation

Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories Matthew Gerber* Andrew S. Gordon+ Kenji Sagae+ *Department of Computer Science Michigan State University +Institute for Creative Technologies University of Southern California

  2. Story-based envisionment • Envisionment tasks • Prediction: what might come next? • Explanation: how did the current state of affairs arise? • Imagination: moving beyond our own experiences • Research question • Can crowd-sourced stories support envisionment? (insert picture of the chatty cityscape)

  3. Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work

  4. Related work • Weblogs and social media • International Conference on Weblogs and Social Media • Topics include sentiment, topic propagation, and others • Knowledge extraction • Factoid extraction (Schubert and Tong, 2003) • Weblog factoid extraction (Gordon, 2009) • Cause identification • Aviation incident reports (Persing and Ng, 2009)

  5. Story identification • Spinn3r (Burton et al., 2009) • Tens of millions of weblog entries • August 1st, 2008 through October 1st, 2008 • Story extraction (Gordon and Swanson, 2009) • Supervised binary classification (75% precision) • 960,098 stories • Currently supports SayAnything, an open-domain collaborative storytelling system

  6. Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work

  7. Story analysis • Rhetorical Structure Theory (Carlson & Marcu, 2001) • Causation • [cause Packages often get buried in the load] [result and are delivered late.] • Temporal precedence • [before Three months after she arrived in L.A.] [after she spent $120 she didn’t have.] • Many other relations that were not used

  8. Story analysis • RST parsing (Sagae, 2009) • Joint syntax/discourse dependency parsing • Linear runtime • 44.5% F1 on RST test section • RST for envisionment • Retrained parser on causal and temporal relations • Extracted 2.2 million causal relations • Extracted 220,000 precedes relations • Indexed two discourse units per relation using Lucene

  9. Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work

  10. Story-based inference for envisionment • Input • Free-text state/event description ( ) • Inference type (causal, temporal) • Inference direction (forward, backward) • Example inferences • : John fell off the ski lift. • Forward causal: John broke his foot. • Backward causal: John drank too much beer at the lodge.

  11. : ranked list of discourse units • … • … • … Lucene • : ranked list of inference results • … • … • … Story-based inference for envisionment • Baseline inference model TF-IDF query

  12. Re-ranked Original Centroid of Story-based inference for envisionment • Re-ranking the inference results ( ) • Centroid similarity • often contains informative redundancies • Compute centroid of and re-rank • … <1,0,1> • … <0,2,1> <1.33,1,0.66> • … <3,1,0> 1. … <3,1,0> (0.88) 2. … <1,0,1> (0.79) 3. … <0,2,1> (0.66)

  13. Story-based inference for envisionment • Re-ranking the inference results ( ) • Centroid similarity • Description score scaling • should also be sensitive to the original query score • Centroid similarity * original query score • Log-length scaling • Favor longer inference results • Centroid similarity * log(length of inference) • Combined description score and log-length scaling • Centroid similarity * original query score * log(length of inference)

  14. Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work

  15. Evaluation setting • 256 sentences (5 documents) • Automatically generated forward/backward causal/temporal inferences for each sentence • Kept best scoring inference for each sentence • Manually evaluated inference results • Inference must increase local coherence • Inference must be globally valid • I didn’t even need a jacket (until I got there). Incorrect: it was a warm day Forward temporal inference

  16. Evaluation results • Tradeoff between inference rate and accuracy

  17. Outline • Background • Related work • Story identification • Story analysis • Story-based inference for envisionment • Evaluation • Conclusions and future work

  18. Conclusions and future work • This is a difficult task and much work remains • Story analysis • Genre adaptation (newswire weblog) • Penn Discourse TreeBank (Prasad et al., 2008) • 3.5-fold increase in training data for causal/temporal relations • Extraction of causal/temporal relations from traditional SRL analyses • Story-based inference • More sophisticated selection of inference/direction • Incorporation of sentence context into model • Exploitation of other redundancies in the story corpus

More Related