110 likes | 238 Views
Coupling Semi-Supervised Learning of Categories and Relations. Andrew Carlson, Justin Betteridge , Estevam R. Hruschka Jr., and Tom M. Mitchell Carnegie Mellon University. The Problem.
E N D
Coupling Semi-Supervised Learning of Categories and Relations Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr., and Tom M. Mitchell Carnegie Mellon University CS 652, Peter Lindes
The Problem “We present an approach to semi-supervised learning that yields more accurate results by coupling the training of many information extractors.” CS 652, Peter Lindes
Predefined Categories • Unary predicates (instances are noun phrases) • Mutually exclusive relationships • Some subset relationships • Flag: proper nouns, common nouns, or both • 10-20 seed instances • 5 seed patterns (automatically derived - Hearst, 1992) • Predefined Relations • Binary predicates (an instance is a pair of noun phrases) • Mutually exclusive relationships • 10-20 seed instances • No seed patterns CS 652, Peter Lindes
The Predicates CS 652, Peter Lindes
Taken from “a 200-million page web crawl” • Filtered for English “using a stop word ratio threshold” • Filtered out web spam and adult content “using a ‘bad word’ list” • Segmented, tokenized, and tagged • Noisy sentences filtered out • 514-million sentences used for experiment CS 652, Peter Lindes
Evaluation • 3 Questions: • “Can CBL iterate many times and still achieve high precision?” • “How helpful are the types of coupling that we employ?” • “Can we extend existing semantic resources?” • 3 Configurations • Full • NS: no sharing of promoted items, seeds shared • NCR: no type checking CS 652, Peter Lindes
Results - Precision Categories Relations Precision estimated by human judging of correctness for 30 samples of each predicate. CS 652, Peter Lindes
Results - Recall Promoted categories and relations – 15 iterations “At this stage of development, obtaining high recall is not a priority … it is our hope that high recall will come with time.” CS 652, Peter Lindes
Example Extracted Facts “We have presented a method of coupling the semi-supervised learning of categories and relations and demonstrated empirically that the coupling forestalls the problem of semantic drift associated with bootstrap learning methods.” CS 652, Peter Lindes
Comparison to Freebase “… our methods can contribute new facts to existing resources.” CS 652, Peter Lindes