1 / 30

Improved Relation Extraction with Feature-Rich Compositional Embedding Models

Improved Relation Extraction with Feature-Rich Compositional Embedding Models. Mo Yu *. Matt Gormley *. September 21, 2015 EMNLP. Mark Dredze. *Co-first authors. FCM or: How I Learned to Stop Worrying (about Deep Learning) and Love Features. Mo Yu *. Matt Gormley *.

trobert
Download Presentation

Improved Relation Extraction with Feature-Rich Compositional Embedding Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved Relation Extraction with Feature-Rich Compositional Embedding Models Mo Yu* Matt Gormley* September 21, 2015EMNLP Mark Dredze *Co-first authors

  2. FCM or: How I Learned to Stop Worrying (about Deep Learning) and Love Features Mo Yu* Matt Gormley* September 21, 2015EMNLP Mark Dredze *Co-first authors

  3. Handcrafted Features born-in p(y|x) ∝exp(Θyf ( ) LOC PER S NP VP ) NP VP ADJP

  4. Where do features come from? hand-crafted features Sunetal.,2011 FeatureEngineering Zhouetal.,2005 FeatureLearning

  5. Where do features come from? Classifier Look-uptable input(context words) embedding missing word hand-crafted features unsupervised learning Sunetal.,2011 FeatureEngineering similar words, similar embeddings cat: CBOW model in Mikolov et al. (2013) dog: Zhouetal.,2005 word embeddings Mikolovet al.,2013 FeatureLearning

  6. Where do features come from? hand-crafted features pooling Sunetal.,2011 The [movie]showed [wars] FeatureEngineering RecursiveAutoEncoder (Socher 2011) ConvolutionalNeuralNetworks (Collobert and Weston 2008) CNN RAE Zhouetal.,2005 string embeddings word embeddings The [movie]showed [wars] Socher, 2011 Mikolovet al.,2013 Collobert & Weston, 2008 FeatureLearning

  7. Where do features come from? hand-crafted features Sunetal.,2011 S tree embeddings FeatureEngineering WNP,VP Socheretal.,2013 NP VP Hermann& Blunsom, 2013 WDT,NN WV,NN Zhouetal.,2005 string embeddings word embeddings Socher, 2011 Mikolovet al.,2013 The [movie]showed [wars] Collobert & Weston, 2008 FeatureLearning

  8. Where do features come from? Refineembeddingfeatureswithsemantic/syntacticinfo word embedding features hand-crafted features Turian et al. 2010 Hermannetal.2014 Kooet al. 2008 Sunetal.,2011 tree embeddings FeatureEngineering Socheretal.,2013 Hermann& Blunsom, 2013 Zhouetal.,2005 string embeddings word embeddings Socher, 2011 Mikolovet al.,2013 Collobert & Weston, 2008 FeatureLearning

  9. Where do features come from? Our model (FCM) word embedding features hand-crafted features Turian et al. 2010 Hermannetal.2014 Kooet al. 2008 Sunetal.,2011 tree embeddings FeatureEngineering Socheretal.,2013 Hermann& Blunsom, 2013 Zhouetal.,2005 string embeddings word embeddings Socher, 2011 Mikolovet al.,2013 Collobert & Weston, 2008 FeatureLearning

  10. Feature-rich Compositional Embedding Model (FCM) Goals for our Model: • Incorporate semantic/syntactic structural information • Incorporate word meaning • Bridge the gap between featureengineering and featurelearning – but remain as simple as possible

  11. Feature-rich Compositional Embedding Model (FCM) Per-word Features: f4 f6 f1 f3 f5 f2 noun-other verb-percep. noun-other nil noun-person verb-comm. The [movie]M1 I watched depicted [hope]M2

  12. Feature-rich Compositional Embedding Model (FCM) Per-word Features: f5 noun-other verb-percep. noun-other nil noun-person verb-comm. The [movie]M1 I watched depicted [hope]M2

  13. Feature-rich Compositional Embedding Model (FCM) Per-word Features: (with conjunction) f5 noun-other verb-percep. noun-other nil noun-person verb-comm. The [movie]M1 I watched depicted [hope]M2

  14. Feature-rich Compositional Embedding Model (FCM) Per-word Features: (with soft conjunction) f5 Outer-product edepicted noun-other verb-percep. noun-other nil noun-person verb-comm. The [movie]M1 I watched depicted [hope]M2

  15. Feature-rich Compositional Embedding Model (FCM) Per-word Features: (with soft conjunction) f5 edepicted noun-other verb-percep. noun-other nil noun-person verb-comm. The [movie]M1 I watched depicted [hope]M2

  16. Feature-rich Compositional Embedding Model (FCM) Ty fi Σ n p(y|x) ∝ exp i=1 And finally, exponentiates and renormalizes e wi Then takes the dot-product with a parameter tensor Our full model sums over each word in the sentence

  17. Features for FCM • Let M1 and M2 denote the left and right entity mentions • Our per-word Binary Features: • head of M1 • head of M2 • in-between M1 and M2 • -2, -1, +1, or +2 of M1 • -2, -1, +1, or +2 of M2 • on dependency path between M1 and M2 • Optionally: Add the entity type of M1, M2, or both

  18. FCM as a Neural Network • Embeddings are (optionally) treated as model parameters • A log-bilinear model • We initialize, then fine-tune the embeddings Parameter tensor Binary features Embeddings

  19. Baseline Model Yi,j • Multinomial logistic regression (standard approach) • Bring in all the usual binary NLP features (Sun et al., 2011) • type of the left entity mention • dependency path between mentions • bag of words in right mention • …

  20. Hybrid Model: Baseline + FCM Yi,j Product of Experts: p(y|x) = pBaseline(y|x) pFCM(y|x) 1 Z(x)

  21. Experimental Setup ACE 2005 SemEval-2010 Task 8 Data: Web text Newswire (nw) Broadcast Conversation (bc) Broadcast News (bn) Telephone Speech (cts) Usenet Newsgroups (un) Weblogs (wl) Train:Dev: Test: Metric:Macro F1 (given entity boundaries) • Data: 6 domains • Newswire (nw) • Broadcast Conversation (bc) • Broadcast News (bn) • Telephone Speech (cts) • Usenet Newsgroups (un) • Weblogs (wl) • Train:bn+nw (~3600 relations)Dev: ½ of bcTest: ½ of bc, cts, wl` • Metric: Micro F1 (given entity mention) Standard split from shared task

  22. ACE 2005 Results

  23. SemEval-2010 Results Best in SemEval-2010 Shared Task

  24. SemEval-2010 Results Best in SemEval-2010 Shared Task

  25. SemEval-2010 Results

  26. SemEval-2010 Results

  27. SemEval-2010 Results

  28. SemEval-2010 Results

  29. Takeaways FCM bridges the gap between feature engineering and feature learning If you are allergic to deep learning: • Try the FCM for your task: it is simple, easy-to-implement, and was shown to be effective for two relation benchmarks If you are a deep learning expert: • Inject the FCM (i.e. outer product of features and embeddings) into your fancy deep network

  30. Questions? Two open source implementations: • Java: (Within the Pacaya framework)https://github.com/mgormley/pacaya • C++: (From our NAACL 2015 paper on LRFCM)https://github.com/Gorov/ERE_RE

More Related