1 / 31

Unsupervised Ontology Induction From Text

Unsupervised Ontology Induction From Text. Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos). Extracting Knowledge From Text. ……. Extracting Knowledge From Text. ……. Extracting Knowledge From Text.

cardea
Download Presentation

Unsupervised Ontology Induction From Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

  2. Extracting Knowledge From Text ……

  3. Extracting Knowledge From Text ……

  4. Extracting Knowledge From Text • Wanted: Automatic, end-to-end solution • Manual engineering: Costly and limited • Supervised learning • Bottleneck: Labeled examples • Infeasible for large-scale, open-domain knowledge extraction

  5. Unsupervised Learning for Knowledge Extraction • TextRunner [Banko et al. 2007] • State-of-the-art open information extraction • Only extracts triples • Extractions are largely unstructured and noisy • USP [Poon & Domingos 2009] • Form complete, detailed meaning representation • More robust to noise • Still limited to extractions with substantial evidence • Lacks ontological structures

  6. Why Ontology? • Compact representation and efficient reasoning [Staab & Studer 2004] • Better generalization Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells. REGULATE ISA Q: What does IL-2 regulate? A: The DEX-mediated IkappaBalpha induction INHIBIT

  7. Ontology Learning • Attracts increasing interest [Snow et al. 2006, Cimiano 2006, Suchanek et al. 2008, Wu & Weld 2008] • Induction: Construct an ontology • Population: Map textual expressions to concepts and relations in the ontology • Limitations in existing approaches • Require heuristic patterns or existing KBs • Pursue each task in isolation Knowledge representation NLP

  8. This Talk: OntoUSP • Jointly conducts: Ontology induction, population, and knowledge extraction • Learns ISA hierarchy over logical expressions • Populates it by translating sentences into logical forms • Extends USP with hierarchical clustering • Hierarchical smoothing • Encoded in a few high-order formulas in Markov Logic [Richardson & Domingos, 2006] • Sole input is dependency trees Five times as many correct answers as TextRunner Improves on the recall of USP by 47%

  9. Outline Background: USP Unsupervised ontology induction Conclusion 11

  10. Semantic Parsing • INDUCE(e1) • IL-4 protein induces CD11b • INDUCER(e1,e2) • INDUCED(e1,e3) • IL-4(e2) • CD11B(e3) Structured prediction:Partition + Assignment induces induces INDUCE nsubj dobj nsubj dobj INDUCED INDUCER protein CD11b protein CD11b nn CD11B nn IL-4 IL-4 IL-4

  11. Challenge: Same Meaning, Many Variations IL-4 inducesCD11b Protein IL-4 enhances the expression of CD11b CD11b expressionis induced by IL-4 protein The cytokin interleukin-4 inducesCD11b expression IL-4’s up-regulation ofCD11b, … ……

  12. Unsupervised Semantic Parsing USPRecursively cluster arbitrary expressions composed with / by similar expressions IL-4 induces CD11b Protein IL-4 enhances the expression of CD11b CD11b expression is enhanced by IL-4 protein The cytokin interleukin-4 induces CD11b expression IL-4’s up-regulation of CD11b, …

  13. Unsupervised Semantic Parsing USPRecursively cluster arbitrary expressions composed with / by similar expressions IL-4 inducesCD11b Protein IL-4enhances the expression of CD11b CD11b expressionis enhanced byIL-4 protein The cytokin interleukin-4inducesCD11b expression IL-4’s up-regulation of CD11b, … Cluster same forms at the atom level

  14. Unsupervised Semantic Parsing USPRecursively cluster arbitrary expressions composed with / by similar expressions IL-4 inducesCD11b Protein IL-4enhances the expression of CD11b CD11b expressionis enhanced byIL-4 protein The cytokin interleukin-4inducesCD11b expression IL-4’s up-regulation ofCD11b, … Cluster forms in composition with same forms

  15. Unsupervised Semantic Parsing USPRecursively cluster arbitrary expressions composed with / by similar expressions IL-4 inducesCD11b Protein IL-4 enhances the expression of CD11b CD11b expressionis enhanced by IL-4 protein The cytokin interleukin-4inducesCD11b expression IL-4’s up-regulation ofCD11b, … Cluster forms in composition with same forms

  16. Unsupervised Semantic Parsing USPRecursively cluster arbitrary expressions composed with / by similar expressions IL-4 inducesCD11b Protein IL-4 enhances the expression of CD11b CD11b expressionis enhanced by IL-4 protein The cytokin interleukin-4inducesCD11b expression IL-4’s up-regulation ofCD11b, … Cluster forms in composition with same forms

  17. Unsupervised Semantic Parsing USPRecursively cluster arbitrary expressions composed with / by similar expressions IL-4 inducesCD11b Protein IL-4 enhances the expression of CD11b CD11b expressionis enhanced by IL-4 protein The cytokin interleukin-4 inducesCD11b expression IL-4’s up-regulation ofCD11b, … Cluster forms in composition with same forms

  18. Probabilistic Model for USP • Joint probability distribution over input dependency trees and their semantic parses • Use Markov logic • A Markov Logic Network (MLN) is a set of pairs (Fi, wi) where • Fi is a formula in higher-order logic • wiis a real number Number of true groundings of Fi

  19. Unsupervised Semantic Parsing • Exponential prior on number of parameters • Cluster mixtures: InClust(e,+c) ^ HasValue(e,+v) Object/Event Cluster: INDUCE Property Cluster: INDUCER induces 0.1 nsubj 0.5 IL-4 0.2 None 0.1 enhances 0.4 … agent 0.4 One 0.8 IL-8 0.1 … … … …

  20. Inference: Hill-Climb Probability induces ? nsubj dobj ? ? Initialize protein CD11B ? ? nn ? IL-4 ? Lambda reduction protein protein ? Search Operator nn ? nn ? IL-4 IL-4 ?

  21. Learning: Hill-Climb Likelihood … protein 1 1 IL-4 enhances 1 induces 1 Initialize MERGE COMPOSE enhances induces 1 1 1 IL-4 protein 1 Search Operator induces 0.2 IL-4 protein 1 enhances 0.8

  22. Outline Background: USP Unsupervised ontology induction Conclusion 24

  23. OntoUSP • USP + Hierarchical clustering + Shrinkage • Modify the cluster mixture formula InClust(e,c) ^ ISA(c,+d) ^ HasValue(e,+v)

  24. New Operator: Abstraction MERGE with REGULATE? 0.3 induces 0.1 enhances induces 0.6 0.2 inhibits suppresses 0.1 up-regulates 0.2 INDUCE … … ISA ISA INHIBIT INDUCE inhibits 0.4 inhibits 0.4 induces 0.6 suppresses INHIBIT 0.2 suppresses 0.2 up-regulates 0.2 … … … Captures substantial similarities

  25. Experiments • Evaluate on an end task:Question answering Applied OntoUSP to extract knowledge from text and answer questions • Evaluation:Number of answers and accuracy • GENIA dataset: 1999 Pubmed abstracts • Questions • Use simple questions in this paper, e.g.: • What does anti-STAT1 inhibit? • What regulates MIP-1 alpha? • Sample 2000 questions according to frequency

  26. Total vs. Correct Answers Improves recall over USP by 47% Five times as many correct answers as TextRunner Highest accuracy of 91% KW-SYN TextRunner RESOLVER DIRT USP OntoUSP

  27. Induced Ontology (Partial)

  28. Question-Answer: Example Q:What does IL-2 control? A:The DEX-mediated IkappaBalpha induction Sentence: Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells.

  29. Conclusion • OntoUSP: Unsupervised ontology induction • USP + hierarchical clustering / smoothing • Jointly conducts ontology induction, population, and knowledge extraction • See you at poster 46

More Related