1 / 11

Coupling Semi-Supervised Learning of Categories and Relations

Coupling Semi-Supervised Learning of Categories and Relations. Andrew Carlson, Justin Betteridge , Estevam R. Hruschka Jr., and Tom M. Mitchell Carnegie Mellon University. The Problem.

celina
Download Presentation

Coupling Semi-Supervised Learning of Categories and Relations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coupling Semi-Supervised Learning of Categories and Relations Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr., and Tom M. Mitchell Carnegie Mellon University CS 652, Peter Lindes

  2. The Problem “We present an approach to semi-supervised learning that yields more accurate results by coupling the training of many information extractors.” CS 652, Peter Lindes

  3. CS 652, Peter Lindes

  4. Predefined Categories • Unary predicates (instances are noun phrases) • Mutually exclusive relationships • Some subset relationships • Flag: proper nouns, common nouns, or both • 10-20 seed instances • 5 seed patterns (automatically derived - Hearst, 1992) • Predefined Relations • Binary predicates (an instance is a pair of noun phrases) • Mutually exclusive relationships • 10-20 seed instances • No seed patterns CS 652, Peter Lindes

  5. The Predicates CS 652, Peter Lindes

  6. Taken from “a 200-million page web crawl” • Filtered for English “using a stop word ratio threshold” • Filtered out web spam and adult content “using a ‘bad word’ list” • Segmented, tokenized, and tagged • Noisy sentences filtered out • 514-million sentences used for experiment CS 652, Peter Lindes

  7. Evaluation • 3 Questions: • “Can CBL iterate many times and still achieve high precision?” • “How helpful are the types of coupling that we employ?” • “Can we extend existing semantic resources?” • 3 Configurations • Full • NS: no sharing of promoted items, seeds shared • NCR: no type checking CS 652, Peter Lindes

  8. Results - Precision Categories Relations Precision estimated by human judging of correctness for 30 samples of each predicate. CS 652, Peter Lindes

  9. Results - Recall Promoted categories and relations – 15 iterations “At this stage of development, obtaining high recall is not a priority … it is our hope that high recall will come with time.” CS 652, Peter Lindes

  10. Example Extracted Facts “We have presented a method of coupling the semi-supervised learning of categories and relations and demonstrated empirically that the coupling forestalls the problem of semantic drift associated with bootstrap learning methods.” CS 652, Peter Lindes

  11. Comparison to Freebase “… our methods can contribute new facts to existing resources.” CS 652, Peter Lindes

More Related