50 likes | 292 Views
Distant Supervision for Knowledge Base Population. Mihai Surdeanu, David McClosky , John Bauer, Julie Tibshirani , Angel Chang, Valentin Spitkovsky, Christopher Manning. Definition and Approach. We took part in TAC KBP 2010 this year (both tasks)
E N D
Distant Supervision for Knowledge Base Population Mihai Surdeanu, David McClosky, John Bauer, Julie Tibshirani, Angel Chang, Valentin Spitkovsky, Christopher Manning
Definition and Approach • We took part in TAC KBP 2010 this year (both tasks) • Slot filling task: learning a pre-defined set of relations and attributes for target entities based on documents in a collection • “Warren Buffett began studying at the Warton School of Finance at the University of Pennsylvania, but transferred to the University of Nebraska where he graduated.” • (per:schools_attended, Warren Buffett, University of Pennsylvania) • (per:schools_attended, Warren Buffett, University of Nebraska • Distant supervision approach: generate training data automatically from Wikipedia infoboxes
Training Evaluation Infobox KB KBP query: entity name Map infobox fields to KBP slots (one to many mapping) IR: find relevant sentences Query: entity name + trigger words IR: find relevant sentences Query: entity name + slot value Extract slot candidates Map KBP slots to fine-grained NE labels Classify candidates Extract +/- slot candidates Inference (greedy, local) Train multiclass classifier Extracted slots
Results Training on 2/3 of infoboxes, evaluating on 1/3 Evaluating only on sentences that contain at least a valid slot Top 10 most common slots Total for all slots
Challenges • Improve quality of data generated through distant supervision • Improve IR recall • Use relation-specific trigger words (or n-grams or dependency paths etc.) to boost sentences likely to contain answers to the top • How to acquire these automatically? • Better classifiers for noisy text (e.g., web snippets)