1 / 20

Semi-supervised Relation Extraction with Large-scale Word Clustering

NYU. Semi-supervised Relation Extraction with Large-scale Word Clustering. Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011. NYU. Outline. Task Problems Solutions and Experiments Conclusion. NYU. 1. Task. Relation Extraction

Download Presentation

Semi-supervised Relation Extraction with Large-scale Word Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NYU Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011

  2. NYU Outline • Task • Problems • Solutions and Experiments • Conclusion

  3. NYU 1. Task • Relation Extraction • The last U.S. presidentto visit … M1 M2 M := Entity Mention Is there a relation between M1 and M2 ? If so, what kind of relation ?

  4. NYU 1. Task • Relation Types (ACE 2004)

  5. NYU 2. Problems • Sparsity of lexical features • Word cluster features to the rescue • Training Instances • USpresident • USsenator • Arkansasgovernor • Israeli government spokesman • … … • Training features • HeadOfM2 = president • HeadOfM2 = spokesman • … … • Testing Instances • USambassador • U.N.spokeswoman • … … • Testing features • HM2 = ambassador • HM2 = spokeswoman • … … C1 president ambassador spokesman spokeswoman WordClusterHM2=C1 WC_HM2=C1

  6. NYU 2. Problems • Problem 1: How to choose effective clusters? • The Brown word hierarchy Where To Cut ?

  7. NYU 2. Problems • Problem 2: Augment which lexical feature to improve generalization accuracy? • Named entity recognition augments every token with cluster • Same for relation extraction? • Relation instance LeftContext M1 MidContext M2 RightContext Where To Generalize ?

  8. NYU 3.1 Cluster Selection • Main idea • Rank each length (from 1 to the length of the longest bit string) based on importance measures • Select a subset of lengths to cut the word hierarchy • Typically select 3 or 4 prefix lengths to avoid commitment to a single cluster 3. Solutions and Experiments

  9. NYU 3.1 Cluster Selection • Importance measure 1: Information Gain (IG) prior entropy of classes relation class A cluster feature with the length i to rank Value of the cluster feature posterior entropy, given values V of the feature f 3. Solutions and Experiments

  10. NYU 3.1 Cluster Selection • Importance measure 2: Prefix Coverage (PC) i := length := lexical feature := non-null cluster feature for the lexical feature Count (*) := number of occurrences 3. Solutions and Experiments

  11. NYU 3.1 Cluster Selection • Other measures to compare with • Use All Prefixes (UA): consider every length, hoping that the underlying learning algorithm can assign proper weights • Exhaustive Search (ES): try every possible subset of lengths and pick the one that works the best 3. Solutions and Experiments

  12. NYU 3.1 Cluster Selection • Experiment • Setup • 348 ACE 2004 bnews and nwire documents • 70 as testing, the rest 278 are split into training and development sets in a ratio of 7:3 • The development set is used to learn the best lengths • Choose only 3 or 4 lengths (match prior work) • For simplicity, only augment the head of each mention with clusters • Induced 1,000 word clusters on the TDT 5 corpora using the Brown Algo. • Baseline • Feature based MaxEnt classification model • A large feature set: • full set from Zhou et al. (2005); • cherry-picked effective features from Zhao and Grishman (2005), Jiang and Zhai (2007) and others 3. Solutions and Experiments

  13. NYU 3.1 Cluster Selection • Experiment • Effectiveness of Cluster Selection Methods 3. Solutions and Experiments

  14. NYU 3.2 Effectiveness of cluster features • Explore cluster features in a systematic way • Rank each lexical feature according to its importance • Importance is based on linguistic intuition and performance contribution from previous research • Test the effectiveness of a lexical feature with augmentation of word clusters • individually and incrementally 3. Solutions and Experiments

  15. NYU 3.2 Effectiveness of cluster features • Importance of lexical features • Simplify an instance into a 3-tuple M1 Other | Head M2 Other | Head Context Other | Head 3. Solutions and Experiments

  16. NYU 3.2 Effectiveness of cluster features • Experiment • Setup • 5-fold cross-validation • PC4 was used to select effective clusters • Performance 3. Solutions and Experiments

  17. NYU 3.2 Effectiveness of cluster features • The Impact of Training Size (augment mention heads only) Sometimes word cluster features allow reduction in annotation 3. Solutions and Experiments

  18. NYU 3.2 Effectiveness of cluster features • Performance of each individual relation class The highlighted 5 types share the same entity type GPE; PER-SOC holds only between PERSON and PERSON; We may say word cluster can also help to distinguish between ambiguous relation types. No improvement for the PHYS relation? It is just too hard! 3. Solutions and Experiments

  19. NYU 4. Conclusion • Main contributions • Proposed a principled way in choosing clusters at an appropriate level of granularity • Systematically explored the effectiveness of word cluster features for relation extraction • Future work • Extend to • phrase clustering (Lin and Wu, 2009) • pattern clustering (Sun and Grishman, 2010)

  20. NYU Thanks!

More Related