1 / 54

Exploiting Background Knowledge for Relation Extraction

Exploiting Background Knowledge for Relation Extraction. Yee Seng Chan and Dan Roth University of Illinois at Urbana-Champaign. Relation Extraction. Relation extraction (RE) “David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team”

Download Presentation

Exploiting Background Knowledge for Relation Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Background Knowledgefor Relation Extraction Yee Seng Chan and Dan Roth University of Illinois at Urbana-Champaign

  2. Relation Extraction • Relation extraction (RE) • “David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team” • Supervised RE • Train on sentences annotated with entity mentions and predefined target relations • Common features: BOW, POS tags, syntactic/dependency parses, kernel functions based on structured representations of the sentence

  3. Background Knowledge • Features employed are usually restricted to being defined on the various representations of the target sentences • Humans rely on background knowledge to recognize relations • Overall aim of this work • Propose methods of using knowledge or resources that exists beyond the sentence • Wikipedia, word clusters, hierarchy of relations, entity type constraints, coreference • As additional features, or under the Constraint Conditional Model (CCM) framework with Integer Linear Programming (ILP)

  4. Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team

  5. David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

  6. David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

  7. David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

  8. David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team Using Background Knowledge David Brian Cone (born January 2, 1963) is a former Major League Baseballpitcher. He compiled an 8–3 postseason record over 21 postseason starts and was a part of five World Series championship teams (1992 with the Toronto Blue Jays and 1996, 1998, 1999 & 2000 with the New York Yankees). He had a career postseason ERA of 3.80. He is the subject of the book A Pitcher's Story: Innings With David Cone by Roger Angell. Fans of David are known as "Cone-Heads." Cone lives in Stamford, Connecticut, and is formerly a color commentator for the Yankees on the YES Network.[1] Contents [hide] 1 Early years 2 Kansas City Royals 3 New York Mets Partly because of the resulting lack of leadership, after the 1994 season the Royals decided to reduce payroll by trading pitcher David Cone and outfielder Brian McRae, then continued their salary dump in the 1995 season. In fact, the team payroll, which was always among the league's highest, was sliced in half from $40.5 million in 1994 (fourth-highest in the major leagues) to $18.5 million in 1996 (second-lowest in the major leagues)

  9. David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

  10. David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

  11. David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

  12. David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team Using Background Knowledge 0.55

  13. Basic Relation Extraction (RE) System • Our basicRE system • Given a sentence “... m1 ... m2 ...”, predict whether any predefined relation holds • Asymmetric relations, e.g. m1:r:m2 vs m2:r:m1

  14. Basic Relation Extraction (RE) System • Our basicRE system • Given a sentence “... m1 ... m2 ...”, predict whether any predefined relation holds • Asymmetric relations, e.g. m1:r:m2 vs m2:r:m1

  15. Basic Relation Extraction (RE) System • Most of the features based on the work in (Zhou et al., 2005) • Lexical: hw, BOW, bigrams, ... • Collocations: words to the left/right of the mentions, ... • Structural: m1-in-m2, #mentions between m1,m2, ... • Entity typing: m1,m2 entity-type, ... • Dependency: dep-path between m1,m2, ...

  16. Knowledge Sources • As additional features • Wikipedia • Word clusters • As constraints • Hierarchy of relations • Entity type constraints • Coreference

  17. Knowledge1: Wikipedia1 (as additional feature) r ? mi mj • We use a Wikifier system (Ratinov et al., 2010) which performs context-sensitive mapping of mentions to Wikipedia pages • Introduce a new feature based on: • introduce a new feature by combining the above with the coarse-grained entity types of mi,mj

  18. Knowledge1: Wikipedia2 (as additional feature) parent-child? mi mj • Given mi,mj, we use a Parent-Child system (Do and Roth, 2010) to predict whether they have a parent-child relation • Introduce a new feature based on: • combine the above with the coarse-grained entity types of mi,mj

  19. Knowledge2: Word Class Information(as additional feature) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Apple apple pear IBM bought run of in • Supervised systems face an issue of data sparseness (of lexical features) • Use class information of words to support generalization better: instantiated as word clusters in our work • Automatically generated from unlabeled texts using algorithm of (Brown et al., 1992)

  20. Knowledge2: Word Class Information 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Apple apple pear bought run of in IBM • Supervised systems face an issue of data sparseness (of lexical features) • Use class information of words to support generalization better: instantiated as word clusters in our work • Automatically generated from unlabeled texts using algorithm of (Brown et al., 1992)

  21. Knowledge2: Word Class Information 0 1 0 1 0 1 1 0 1 0 0 1 0 1 Apple apple pear bought run of in IBM 011 • Supervised systems face an issue of data sparseness (of lexical features) • Use class information of words to support generalization better: instantiated as word clusters in our work • Automatically generated from unlabeled texts using algorithm of (Brown et al., 1992)

  22. Knowledge2: Word Class Information 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Apple apple pear IBM bought run of in 01 10 00 11 All lexical features consisting of single words will be duplicated with its corresponding bit-string representation

  23. Knowledge Sources • As additional features • Wikipedia • Word clusters • As constraints • Hierarchy of relations • Entity type constraints • Coreference

  24. Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008) weight vector for “local” models collection of classifiers

  25. Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008) penalty for violating the constraint weight vector for “local” models how far y is from a “legal” assignment collection of classifiers

  26. Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008) • hierarchy of relations • entity type constraints • coreference • Wikipedia • word clusters

  27. Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008) • Goal of CCM: when you want to predict multiple variables, and you want to exploit the fact that they are related • Encode knowledge as constraints to exploit interaction between the multiple predictions • You impose constraints on the predictions of your various models. This is a global inference problem • We learn separate models and then perform joint global inference to arrive at final predictions

  28. David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team Constraint Conditional Models (CCMs)

  29. Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008) • Key steps • Write down a linear objective function • Write down constraints as linear inequalities • Solve using integer linear programming (ILP) packages

  30. Knowledge3: Relations between our target relations personal employment ... ... ... ... family biz executive staff

  31. Knowledge3: Hierarchy of Relations coarse-grained classifier personal employment ... ... ... ... fine-grained classifier family biz executive staff

  32. Knowledge3: Hierarchy of Relations coarse-grained? mi mj fine-grained? personal employment ... ... ... ... family biz executive staff

  33. Knowledge3: Hierarchy of Relations personal employment ... ... ... ... family biz executive staff

  34. Knowledge3: Hierarchy of Relations personal employment ... ... ... ... family biz executive staff

  35. Knowledge3: Hierarchy of Relations personal employment ... ... ... ... family biz executive staff

  36. Knowledge3: Hierarchy of Relations personal employment ... ... ... ... family biz executive staff

  37. Knowledge3: Hierarchy of Relations personal employment ... ... ... ... family biz executive staff

  38. Knowledge3: Hierarchy of Relations coarse-grained prediction probabilities fine-grained prediction probabilities • Write down a linear objective function

  39. Knowledge3: Hierarchy of Relations coarse-grained prediction probabilities fine-grained prediction probabilities coarse-grained indicator variable fine-grained indicator variable indicator variable == relation assignment • Write down a linear objective function

  40. Knowledge3: Hierarchy of Relations • Write down constraints • If a relation R is assigned a coarse-grained label rc, then we must also assign to R a fine-grained relation rf which is a child of rc. • (Capturing the inverse relationship) If we assign rf to R, then we must also assign to R the parent of rf, which is a corresponding coarse-grained label

  41. Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007) Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in mi mj Entity types are useful for constraining the possible labels that a relation R can assume

  42. Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007) Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in per org org per mi mj per per per per per gpe per per gpe org Entity types are useful for constraining the possible labels that a relation R can assume

  43. Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007) Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in per org org per mi mj per per per per per gpe per per gpe org • We gather information on entity type constraints from ACE-2004 documentation and impose them on the coarse-grained relations • By improving the coarse-grained predictions and combining with the hierarchical constraints defined earlier, the improvements would propagate to the fine-grained predications

  44. Knowledge5: Coreference Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in mi mj

  45. Knowledge5: Coreference Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in mi mj null In this work, we assume that we are given the coreference information, which is available from the ACE annotation.

  46. Experiments • Used the ACE-2004 dataset for our experiments • Relations do not cross sentence boundaries • We model the argument order (of the mentions) • m1:r:m2 vs m2:r:m1 • Allow null label prediction when mentions are not related • Classifiers • regularized averaged perceptrons implemented within the SNoW (Carlson et al., 1999) • Followed prior work (Jiang and Zhai, 2007) and performed 5-fold cross validation

  47. Performance of the Basic RE System • Build a BasicRE system using only the basic features • Compare against the state-of-the-art feature-based RE system of Jiang and Zhai (2007) • The authors performed their evaluation using undirected coarse-grained relations (7 relation labels + 1 null label) • Evaluation on nwire and bnews corpora of ACE-2004 • Performance (F1%)

  48. Experimental Settings • ACE-2004: 7 (coarse) and 23 (fine) grained relations • Trained two classifiers: • coarse-grained (15 relation labels) • fine-grained (47 relation labels) • Focus on evaluation of fine-grained relations • Use the nwire corpus for our experiments • Two of our knowledge sources (Wiki system, word clusters) assume inputs of mixed-case text • bnews corpus in lower-cased text • 28,943 relation instances with 2,226 (non-null)

  49. Evaluation Settings more realistic

  50. Experimental Settings null? r ... mi ... mj ... mk ... • Evaluate our performance at the entity level • Prior work calculated RE performance at the level of mentions • ACE annotators rarely duplicate a relation link for coreferent mentions: • Given a pair of entities, we establish the set of relation types existing between them, based on their mention annotations

More Related