120 likes | 133 Views
Explore the mining of commonsense knowledge from personal stories in internet weblogs, as discussed by Andrew S. Gordon at AKBC Grenoble in 2010.
E N D
Mining commonsense knowledge from personal stories in Internet weblogs Andrew S. Gordon Institute for Creative Technologies University of Southern California AKBC Grenoble, 19 May 2010
Commonsense knowledge If you believe that something would cause the thing that you want, then this causes you to want that something as well. (forall (a e0 e1 e2 g2) (if (and (goal’ g2 e2 a)(cause’ e0 e1 e2)(believe a e0)) (exist (g1)(and (goal’ g1 e1 a)(cause g2 g1))))) If you you believe that something you want will exist in some upcoming time, this causes you to be happy. (forall (a g e1 e2 e3 t1 t2) (if (and (goal’ e1 ga)(atTime e1 t1)(atTime’ e2 g t2) (believe’ e3 a e2)(atTime e3 t1)(intMeets t1 t2)) (exists (e4) (and (happy’ e4 a)(atTime e4 t1)(cause e3 e4)))))
Storytelling in Internet weblogs ... I arrived at 5pm on Monday evening and quickly made my way to the hotel. My electric razor is now in the hands of whoever took it out of my checked suitcase, but the joke is on them because I left the plug at home so they might get about a weeks worth of shaves out of it, not to mention the whole voltage difference. I was able to purchase a disposable razor at an "Auto Service”. These are little convenient stores mostly run by Koreans down in Paraguay. ... • “my life and experience” is the #1 topic of people’s weblogs • Pew Internet & American Life Project, 2006 • 1 million new non-spam weblog posts each day • Spinn3r.com • Millions of stories of everyday human experience • Roughly 4.8% weblog posts are personal stories • Another day at school, at the office, or on the road • The occasional amazing occurrence
Automated story classification Classifier Story corpus training data • Confidence weighted linear classifiers, 5K training, n-grams • 63% precision, 39% recall, 0.48 F-score • ICWSM 2009 corpus, 66M posts = 960,098 English stories • 2010 Spinn3r daily feed: 2.4M English posts = 35K stories
Stories as a commonsense knowledge base Output consequence Input antecedent reasoning “He lost control of the car” “and smashed into a tree” stories 1 2 3 Manshadi, M., Swanson, R., and Gordon, A. (2008) Learning a Probabilistic Model of Event Sequences From Internet Weblog Stories. FLAIRS-2008. Swanson, R. and Gordon, A. (2009) Open Domain Collaborative Storytelling With Say Anything. ICWSM-2009. Gerber, M., Gordon, A., &Sagae, K. (2010) Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories. NAACL-2010, Learning by Reading Workshop.
SayAnything example: Stormy Seas The weather broke, so we sailed out of the harbor. As Victoria grew nearer, the waves grew larger and we furled some foresail and turned to run. We sailed at about 9 knots with good trim, but the storm eventually caught up with us. With its big open cockpit and heavy nose, I didn't like its chances in the kind of sea you get out there almost continuously that time of year. Sure enough the boat was completely inadequate, and we were tossed into the cold ocean. Everyone in our group of seven tourists -- five locals and a Japanese couple -- was pretty excited about the experience. The Japanese couple were the ones that saved us though, with their expert swimming abilities. as far as that goes it was just the four of us. The last tourist was lost at sea, never to be found. Drowned or murdered, the bloated, stinking bodies that turn up by the hundreds will look much the same. Such is the way with storms like that! The weather broke, so we sailed out of the harbor. As Victoria grew nearer, the waves grew larger and we furled some foresail and turned to run. We sailed at about 9 knots with good trim, but the storm eventually caught up with us. With its big open cockpit and heavy nose, I didn't like its chances in the kind of sea you get out there almost continuously that time of year. Sure enough the boat was completely inadequate, and we were tossed into the cold ocean. Everyone in our group of seven tourists -- five locals and a Japanese couple -- was pretty excited about the experience. The Japanese couple were the ones that saved us though, with their expert swimming abilities. as far as that goes it was just the four of us.The last tourist was lost at sea, never to be found. Drowned or murdered, the bloated, stinking bodies that turn up by the hundreds will look much the same.Such is the way with storms like that!
Commonsense knowledge evaluations • There was only French food on the menu, • So I ordered the sashimi with miso. • So I ordered the crepes with glass of wine. • It was a short hike mostly downhill, • So I was exhausted at the end. • So I hardly broke a sweat. • I didn’t have mobile phone service in France, • So I called my office every fifteen minutes. • So I used email to stay in touch with my office.
(1) I arrived at 5pm on Monday evening and (2) quickly made my way to the hotel. (3) My electric razor is now in the hands of whoever took it out of my checked suitcase, but (4) the joke is on them because (5) I left the plug at home so (6) they might get about a weeks worth of shaves out of it, not to mention (7) the whole voltage difference. (8) I was able to purchase a disposable razor at an "Auto Service". (9) These are little convenient stores mostly run by Koreans down in Paraguay. Constants Past Present Future 7 9 5 1 2 8 3 4 6 Constant states Exogenous events Causal links Temporal links Fluent states Action events
Clustering discourse evidence ...and quickly made my way to the hotel...
Current approach • Scale • Every English-language personal story appearing in weblogs • 6M in 2010 • Representation • Clauses linked by discourse relations • Aggregation • Textual similarity between clauses • Inference model • Bayesian inference between cluster nodes • Evaluation • Choice of plausible consequences
Andrew S. Gordon Institute for Creative Technologies University of Southern California gordon@ict.usc.edu