1 / 12

Knowledge Base Acceleration TREC 2012

November 17, 2011. Knowledge Base Acceleration TREC 2012. John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov. November 17, 2011. Number of People C reating Representations of Knowledge. WWW. Expert Systems. Machine Learning. Transistor. Telegraph. Gutenberg Bible.

nuwa
Download Presentation

Knowledge Base Acceleration TREC 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. November 17, 2011 Knowledge Base AccelerationTREC 2012 John R. Frank jrf@mit.edu Ian Soboroff ian.soboroff@nist.gov

  2. November 17, 2011 Number of People Creating Representations of Knowledge WWW Expert Systems Machine Learning Transistor Telegraph Gutenberg Bible Library at Alexandria maps 300BC 140AD 1828 1900 1950 1970 1984 1994 2001 now

  3. November 17, 2011 Accelerate? rate of assimilation << rate of new info # editors << # active entities (definition of a “large” KB)

  4. November 17, 2011 Random Choices How many days must a news article wait before eventually being cited in Wikipedia? Time lag in days between publication and eventual citation in Wikipedia of a sample of 50,000 web pages (mostly news) cited in Category:LivingPeople. num pages days

  5. November 17, 2011 Even for entities mentioned frequently in the news, there is no correlation between mention and edit frequencies. Human analysts follow their personal interests, hunches, hobbies, habits. True for all large knowledge bases. mean edit interval (days) mean mention interval (hours)

  6. November 17, 2011 Methods in the Madness mean edit interval (days) mean mention interval (hours)

  7. November 17, 2011 First Year Task: Basic CCR“Cumulative Citation Recommendation” • Steps: • Initialize with a single KB node: • Freebase & Wikipedia content • WP edits from Aug-Nov 2011 • Begin iterating over news stream • For each article, output a “pertinence” confidence score between 0 and 1. • Aug-Sep: train on labels • Oct-Mar: labels hidden • Your system generates labels and excerpts Oct-Mar • Content Stream • ~500,000 English de-duplicated articles per day • Half news, half blogs & forums Your System

  8. November 17, 2011 Challenging Example 1 • Gavin Rain • South African • Painter Venice Biennale (art show) will have an exhibit in explicit mentions in news No explicit co-occurrence …inference? explicit mentions 2 3 Controversy about South African Pavilion at Venice Biennale

  9. November 17, 2011 Annotations (guidelines under development)

  10. November 17, 2011 Future Tasks Detect changes to infobox slot values Detect new links between entities Resurrection of old articles (archive mining) Identify emerging entities (not yet in KB) … many more ideas …

  11. November 17, 2011 Timeline • December 2011 • Call for Participation • Test data • Three nodes • Four months • March/April 2012 • Full data: • ~50 nodes • Eight months Submit your runs for eval Nov Dec Jan Feb Mar Apr Jun Jul Aug Sep Oct Nov now Monthly Skype Calls and discussion in Google Groups TREC 2012 • Summer meet up • At a convenient conference? • January 2012 • Tentative: • Eval results for baseline system • Data for more nodes

  12. November 17, 2011 Optional Output Values(not judged in 2012) • Novelty Group ID: • Output is a list of docIDs: • Output an empty list means this doc has new information • Output one or more previous docIDs means that all of this document’s pertinent info was already revealed in earlier docs • Would help us plan future tasks about novelty • Links to other nodes: • Output a list of other KB nodes that this content item associates to the target node • Would help us plan future tasks about link detection • Infobox slot name=value • Output a list of two-tuples of strings • [(slot name, slot value),…] • Would help us plan future tasks about detecting infobox changes

More Related