1 / 52

Advanced Relation Extraction Techniques: A Comprehensive Overview

Understand relation extraction in NLP and its applications, including structured knowledge bases like Freebase and DBpedia. Explore techniques for extracting binary relations, importance of relation extraction, and creating new structured knowledge bases. Learn about automated content extraction examples like ACE and unified medical language systems.

Download Presentation

Advanced Relation Extraction Techniques: A Comprehensive Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 14 Relation Extraction CSCE 771 Natural Language Processing • Topics • Relation Extraction • Readings: Chapter 22 • NLTK 7.4-7.5 March 4, 2013

  2. Overview • Last Time • NER • NLTK • Chunking Example 7.4 (code_chunker1.py), • chinking Example 7.5 (code_chinker.py) • Evaluation Example 7.8 (code_unigram_chunker.py) • Example 7.9 (code_classifier_chunker.py) • Today • Relation extraction • ACE: Freebase, DBPedia • Ontological relations • Rules for IS-A extracting • Supervised Relation Extraction for relations • Relation Bootstrapping • Unsupervised relation extraction • NLTK 7.5 Named Entity Recognition • Readings • NLTK Ch7.4 - 7.5

  3. Dear Dr. Mathews, • I have the following questions: • 1. (c) Do you need the regular expression that will capture the link inside href="..."? •     (d) What kind of description you want? It is a python function with no argument. Do you want answer like that? • 3. (f-g) Do you mean top 100 in terms of count? • 4.(e-f) You did not show how to use nltk for HMM and Brill tagging. Can you please give an example? • -Thanks

  4. Relation Extraction • What is relation extraction? • Founded in 1801 as South Carolina College, USC is the flagship institution of the University of South Carolina System and offers more than 350 programs of study leading to bachelor's, master's, and doctoral degrees from fourteen degree-granting colleges and schools to an enrollment of approximately 45,251 students, 30,967 on the main Columbia campus. … [wiki] • complex relation = summarization • focus on binary relation predicate(subject, object) or triples <subj predicate obj>

  5. Wiki Info Box – structured data • template • standard things about Universities • Established • type • faculty • students • location • mascot

  6. Focus on extracting binary relations • predicate(subject, object) from predicate logic • triples <subj relation object> • Directed graphs

  7. Why relation extraction? • create new structured KB • Augmenting existing: words -> wordnet, facts -> FreeBase or DBPedia • Support question answering: Jeopardy • Which relations • Automated Content Extraction (ACE) http://www.itl.nist.gov/iad/mig//tests/ace/ • 17 relations • ACE examples

  8. Unified Medical Language System (UMLS) • UMLS: Unified Medical 134 entities, 54 relations http://www.nlm.nih.gov/research/umls/

  9. UMLS semantic network

  10. Current Relations in the UMLS Semantic Network •  isa     associated_with         physically_related_to             part_of             consists_of             contains             connected_to             interconnects             branch_of             tributary_of             ingredient_of         spatially_related_to             location_of             adjacent_to             surrounds             traverses         functionally_related_to             affects                  … • … • temporally_related_to              co-occurs_with              precedes • conceptually_related_to • evaluation_of              degree_of              analyzes                  assesses_effect_of              measurement_of              measures              diagnoses              property_of              derivative_of              developmental_form_of              method_of              …

  11. Databases of Wikipedia Relations • DBpedia is a crowd-sourced community effort • to extract structured information from Wikipedia • and to make this information readily available • DBpedia allows you to make sophisticated queries http://dbpedia.org/About

  12. English version of the DBpedia knowledge base • 3.77 million things • 2.35 million are classified in an ontology • including: • including 764,000 persons, • 573,000 places (including 387,000 populated places), • 333,000 creative works (including 112,000 music albums, 72,000 films and 18,000 video games), • 192,000 organizations (including 45,000 companies and 42,000 educational institutions), • 202,000 species and  • 5,500 diseases.

  13. freebase • google (freebase wiki) http://wiki.freebase.com/wiki/Main_Page

  14. Ontological relations • Ontological relations • IS-A hypernym • Instance-of • has-Part • hyponym (opposite of hypernym)

  15. How to build extractors

  16. Extracting IS_A relation • (Hearst 1992) Atomatic Acquisition of hypernyms • Naproxen sodium is a nonsteroidal anti-inflammatory drug (NSAID). [wiki]

  17. Hearst's Patterns for IS-A extracting • Patterns for <X IS-A Y> • “Y such as X” • “X or other Y” • “Y including X” • “Y, especially X”

  18. Extracting Richer Relations • Extracting Richer Relations Using Specific Rules • Intuition: relations that commonly hold: located-in, cures, owns • What relations hold between two entities

  19. Fig 22.16 Pattern and Bootstrapping

  20. Hand-built patterns for relations • Hand-built patterns for relations • Pros • Cons

  21. Supervised Relation Extraction • How to do Classification is supervise relation extraction • 1 find all pairs of named entities • 2. decides if they are realted • 3,

  22. ACE- Automated Content Extraction • http://projects.ldc.upenn.edu/ace/ • Linguistic Data Consortium • Entity Detection and Tracking (EDT) is • Relation Detection and Characterization (RDC) • Event Detection and Characterization (EDC) • 6 classes of relations 17 overall

  23. Word features for relation Extraction • Word features for relation Extraction • Headwords of M1 and M2 • Named Entity Type and • Mention Level Features for relation extraction • name, pronoun, nominal

  24. Parse Features for Relation Extraction • Parse Features for Relation Extraction • base syntatic chuck seq from one to another • constituent path • Dependency path

  25. Gazeteer and trigger word features for relation extraction • Trigger list fokinship relations • Gazeteer: name-list

  26. Evaluation of Supervised Relation Extraction • Evaluation of Supervised Relation Extraction • P/R/F • Summary • + hgh accuracies • - training set • models are brittle • don't generalize well

  27. Semi-Supervised Relation Extraction • Seed-based or bootstrapping approaches to RE • No training set • Can you … do anything? • Bootsrapping

  28. Relation Bootstrapping • Relation Bootstrapping (Hearst 1992) • Gather seed pairs of relation R • iterate • find sentences with pairs, • look at context... • use patterns to search for more pairs

  29. Bootstrapping Example

  30. Extract <author, book> pairs • Dipre: start with seeds • Find instances • Extract patterns • Now iterate

  31. Snowball Algorithm Agichtein, Gravano 2000 • Snowball Algorithm by Agichtein, Gravano 2000 • Distant supervision • Distant supervision paradigm • Like classified

  32. Unsupervised relation extraction • Banko et al 2007 “Open information extraction from the Web” • Extracting relations from the web with • no training data • no predetermined list of relations • The Open Approach • Use parse data to train a “trust-worthy” classifier • Extract trustworthy relations among NPs • Rank relations based on text redundancy

  33. Evaluation of Semi-supervised and Unsupervised RE • Evaluation of Semi-supervised and Unsupervised RE • No gold std ... the web is not tagged • no way to compute precision or recall • Instead only estimate precision • draw sample check precision manually • alternatively choose several levels of recall and check the precision there • No way to check the recall? • randomly select text sample and manually check

  34. NLTK Info. Extraction • .

  35. NLTK Review • NLTK 7.1-7.3 • Chunking Example 7.4 (code_chunker1.py), • chinking Example 7.5 (code_chinker.py) • simple re_chunker • Evaluation Example 7.8 (code_unigram_chunker.py) • Example 7.9 (code_classifier_chunker.py

  36. Review 7.4: Simple Noun Phrase Chunker • grammar = r""" • NP: {<DT|PP\$>?<JJ>*<NN>} # chunk determiner/possessive, adjectives and nouns • {<NNP>+} # chunk sequences of proper nouns • """ • cp = nltk.RegexpParser(grammar) • sentence = [("Rapunzel", "NNP"), ("let", "VBD"), ("down", "RP"), ("her", "PP$"), ("long", "JJ"), ("golden", "JJ"), ("hair", "NN")] • print cp.parse(sentence)

  37. (S • (NP Rapunzel/NNP) • let/VBD • down/RP • (NP her/PP$ long/JJ golden/JJ hair/NN))

  38. Review 7.5: Simple Noun Phrase Chinker • grammar = r""" • NP: • {<.*>+} # Chunk everything • }<VBD|IN>+{ # Chink sequences of VBD and IN • """ • sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")] • cp = nltk.RegexpParser(grammar) • print cp.parse(sentence)

  39. >>> • (S • (NP the/DT little/JJ yellow/JJ dog/NN) • barked/VBD • at/IN • (NP the/DT cat/NN)) • >>>

  40. RegExp Chunker – conll2000 • import nltk • from nltk.corpus import conll2000 • cp = nltk.RegexpParser("") • test_sents = conll2000.chunked_sents('test.txt', chunk_types=['NP']) • print cp.evaluate(test_sents) • grammar = r"NP: {<[CDJNP].*>+}" • cp = nltk.RegexpParser(grammar) • print cp.evaluate(test_sents)

  41. ChunkParse score: • IOB Accuracy: 43.4% • Precision: 0.0% • Recall: 0.0% • F-Measure: 0.0% • ChunkParse score: • IOB Accuracy: 87.7% • Precision: 70.6% • Recall: 67.8% • F-Measure: 69.2%

  42. Conference on Computational Natural Language Learning • Conference on Computational Natural Language Learning (CoNLL-2000) • http://www.cnts.ua.ac.be/conll2000/chunking/ • CoNLL 2013 : Seventeenth Conference on Computational Natural Language Learning

  43. Evaluation Example 7.8 (code_unigram_chunker.py) • AttributeError: 'module' object has no attribute 'conlltags2tree'

  44. code_classifier_chunker.py • NLTK was unable to find the megam file! • Use software specific configuration paramaters or set the MEGAM environment variable. • For more information, on megam, see: • <http://www.cs.utah.edu/~hal/megam/>

  45. 7.4   Recursion in Linguistic Structure

  46. code_cascaded_chunker • grammar = r""" • NP: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN • PP: {<IN><NP>} # Chunk prepositions followed by NP • VP: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments • CLAUSE: {<NP><VP>} # Chunk NP, VP • """ • cp = nltk.RegexpParser(grammar) • sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"), • ("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")] • print cp.parse(sentence)

  47. >>> • (S • (NP Mary/NN) • saw/VBD • (CLAUSE • (NP the/DT cat/NN) • (VP sit/VB (PP on/IN (NP the/DT mat/NN)))))

  48. A sentence having deeper nesting • sentence = [("John", "NNP"), ("thinks", "VBZ"), ("Mary", "NN"), • ("saw", "VBD"), ("the", "DT"), ("cat", "NN"), ("sit", "VB"), • ("on", "IN"), ("the", "DT"), ("mat", "NN")] • print cp.parse(sentence) • (S • (NP John/NNP) • thinks/VBZ • (NP Mary/NN) • saw/VBD • (CLAUSE • (NP the/DT cat/NN) • (VP sit/VB (PP on/IN (NP the/DT mat/NN)))))

  49. Trees • print tree4[1] • (VP chased (NP the rabbit)) • tree4[1].node • 'VP‘ • tree4.leaves() • ['Alice', 'chased', 'the', 'rabbit'] • tree4[1][1][1] • ‘rabbitt’ • tree4.draw()

  50. Trees - code_traverse.py • def traverse(t): • try: • t.node • except AttributeError: • print t, • else: • # Now we know that t.node is defined • print '(', t.node, • for child in t: • traverse(child) • print ')', • t = nltk.Tree('(S (NP Alice) (VP chased (NP the rabbit)))') • traverse(t)

More Related