210 likes | 373 Views
Entity-oriented filtering of large streams. Date: Tue, 13 Mar 2012 02:45:40 +0000 From: Google Alerts < googlealerts-noreply@google.com > Subject: Google Alert - "John R. Frank" === Web - 2 new results for ["John R. Frank"] === John R. Frank
E N D
Date: Tue, 13 Mar 2012 02:45:40 +0000 From: Google Alerts <googlealerts-noreply@google.com> Subject: Google Alert - "John R. Frank" === Web - 2 new results for ["John R. Frank"] === John R. Frank SPOKANE, Wash. - John R. Frank, 55, died March 4, 2012, in Coeur d' Alene, Idaho. Survivors include: his wife, Miki; daughter, Patricia Frank; ... <http://www.hutchnews.com/obituaries/Frank--John-CP> In Memory of John R Frank Biography. John R. Frank, age 55, passed away at Sacred Heart Medical Center in Spokane, WA, on March 4, 2012. John was born in Hutchison, KS, ... <http://www.englishfuneralchapel.com/sitemaker/sites/Englis1/obit.cgi?user=583335Frank>
2012 Task:Filtering to Recommend Citations Entities in Wikipedia or another Knowledge Base • Initialize with a target WP entity • state of WP from Jan 2012 • Iterate over stream of text items • Oct-Dec 2011: train on labels • For each, output confidence between 0, 1 • Jan-Apr 2012: labels hidden Automatically recommend new edits Sponsors: Diffeo • Content Stream • 462M texts, 40% English • 4,973 hourly chunks of a 105 docs/hour • News, blogs, forums, and link shortening Your KBA System
Accelerate? rate of assimilation << stream size # editors << # entities << # mentions (definition of a “large” KB)
How many days must a news article wait before being cited in Wikipedia?
Has many interests, including trying to takeover UK soccer teams. His empire includes many entities… Note: Usmanov not mentioned in this text! Elaborate link trails… Citation #18
Example KBA Rating Task Published: March 31, 2012 Impact of Thoughts on Water By Denis Gorce-Bourge Water covers 70% of our Blue planet and our body is made of about 70% water. Masaru Emoto is a Japanese Photographer and scientist. He is known over the world for his remarkable work on water and its deep connection with individual and collective consciousness. For decades, Masaru took pictures of frozen crystals of water and tested the direct influence of the environment on the quality of those crystals. Pollution has a direct impact on the beauty of a frozen crystal but as well words, music and thoughts. He tested the quality of water crystals by exposing it to various conditions: to written words like hate and violence and Love and gratitude. The results were just astonishing. The crystal exposed to Love and gratitude was beautiful and perfectly formed where the other one was severely degraded. He demonstrated as well the impact of Heavy Metal music versus Mozart or Beethoven and how the vibration of music impacts water. The very shape of water crystals is modified by violence, aggression, and negative words.
Example KBA Rating Task Published: March 31, 2012 Impact of Thoughts on Water By Denis Gorce-Bourge Water covers 70% of our Blue planet and our body is made of about 70% water. Masaru Emoto is a Japanese Photographer and scientist. He is known over the world for his remarkable work on water and its deep connection with individual and collective consciousness. For decades, Masaru took pictures of frozen crystals of water and tested the direct influence of the environment on the quality of those crystals. Pollution has a direct impact on the beauty of a frozen crystal but as well words, music and thoughts. He tested the quality of water crystals by exposing it to various conditions: to written words like hate and violence and Love and gratitude. The results were just astonishing. The crystal exposed to Love and gratitude was beautiful and perfectly formed where the other one was severely degraded. He demonstrated as well the impact of Heavy Metal music versus Mozart or Beethoven and how the vibration of music impacts water. The very shape of water crystals is modified by violence, aggression, and negative words.
TRECing the continental divide between NLP and IR • NLP: • Data parsing centric • Universal annotation • Scores probabilities • Reductionist • IR: • User task centric • Variation in interpretation • Scores cascading lists • Constructionist, emergence
string matching task generator 91% recall 15% precision 26% F1
KBA 2013 More entity types with an emphasis on temporality in the stream.
KBX Pool top-K filtered docs, or use each KBA run as separate KBP input. (1000x filter) Cold Start queries focused on: nil entities related to target cluster and/or causality of event Output KB KBP KBA Must coordinate choice of KBA target entities with desired content of KBs for Cold Start queries. Clusters of related entities and/or event-type entities • KBA Stream Corpus 2012 (or the new Stream Corpus 2013) • 462M texts, 40% English • 4,973 hourly chunks of a 105 docs/hour • News, blogs, forums, and link shortening
Sponsors: Thank You. Diffeo
Thanks for your time. John R. Frank jrf@mit.edu http://trec-kba.org