1 / 93

CERATOPS Center for Extraction and Summarization of Events and Opinions in Text

CERATOPS Center for Extraction and Summarization of Events and Opinions in Text. Janyce Wiebe, U. Pittsburgh Claire Cardie, Cornell U. Ellen Riloff, U. Utah. Overview. Rapidly re-trainable, robust components for: Information extraction of facts and entities related to events from text

Download Presentation

CERATOPS Center for Extraction and Summarization of Events and Opinions in Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CERATOPSCenter for Extraction and Summarization of Events and Opinions in Text Janyce Wiebe, U. Pittsburgh Claire Cardie, Cornell U. Ellen Riloff, U. Utah

  2. Overview Rapidly re-trainable, robust components for: • Information extraction of facts and entities related to events from text • Extraction of opinions and motivations expressed in text • Tracking, linking, and summarizing events and opinions and their progressions over time

  3. Rapid semantic processing of large volumes of unstructured text Automatic merging of facts and entity relationships across sets of documents Automatic population of large databases with factual information from many text sources Motivation for Event IE Systems

  4. Information Extraction from Text After a brief lull, the avian flu is on the march again through Fraser Valley poultry farms. The Canadian Food Inspection Agency says ongoing surveillance efforts have led to the detection of bird flu on 36 commercial premises. The agency says it is continuing depopulation efforts on infected farms on a priority basis. OUTBREAK Disease: Victims: Location: Country: Status: Containment: / bird flu / 36 commercial premises Canada confirmed avian flu poultry Fraser Valley poultry farms depopulation

  5. Keywords and named entity recognition are not sufficient. Troops were vaccinated against anthrax, cholera, … Researchers have discovered how anthrax toxin destroys cells and rapidly causes death ... Information Extraction of Events Extracting facts and entity relations associated with events of interest. Terrorist incidents: perpetrators, victims, physical targets, weapons, date, location Disease outbreaks: disease, organisms, victim, symptoms, location, country, date, containment measures

  6. Syntactic Analysis Extraction Coreference Resolution Template Generation 3 chickens died from avian flu. 3 chickens died from avian flu. SUBJVPPP Fact: DEATH Victim: 3 chickens Disease: avian flu 3 chickens died from avian flu. The birds were found in Canada. Event: Outbreak Victim: 3 chickens / the birds Disease: bird flu Country: Canada

  7. kidnapper, arsonist, assassin agent (perpetrator) casualty, fatality, victim theme (victim) Disease Reports: toddler, girl, boy victim Crime Reports: restaurant, store, hotel location New Approach: Role-Identifying Nouns Lexically role-identifying nouns are defined by the role that the noun plays in an event. Semantically role-identifying nouns strongly evoke one event role in a domain based on semantics. (Intuition from Grice’s Maxim of Relevance)

  8. Unannotated Texts Ex: murderer, sniper, criminal Bootstrapped Learning of Role-Identifying Nouns Ex: assassin, arsonist, kidnapper Ex: <subj> was arrested killed by <np> Best Extraction Patterns Best Extractions (Nouns)

  9. <subject> was kidnapped by <np> in <np> victim perpetrator location • But sometimes, a verb identifies a role player in an event without identifying the event! <subject> participated <subject> was implicated perpetrator Role-Identifying Expressions • Typically, a verb refers to an event and the verb’s arguments identify the role players:

  10. relevant event nouns Bootrapped Learning of Role-Identifying Expressions event STEP 1 STEP 2 AutoSlog Basilisk event extraction patterns event nouns Candidate RIE Pattern Generator candidate RIE patterns

  11. Learning to Extract Perpetrators [Phillips & Riloff, RANLP-07] Role-Identifying Nouns: assailants, attackers, cell, culprits, extremists, hitmen, kidnappers, militiamen, MRTA, narco-terrorists, sniper Event-Specific Patterns: was kidnapped by <np> was killed by <np> Role-Identifying Patterns: EVENT was perpetrated by <np> <subject> was involved in EVENT

  12. Decoupling Relevant Region Identification and Extraction Local pattern matching has two drawbacks: • Facts can be missed if they do not occur with the event description. • False hits can be generated from irrelevant contexts. …the explosion ripped through the busy neighborhood in New Delhi. A bombwas found under a parked car… • Solution: • Identify relevant text regions. • Apply general, but semantically appropriate patterns

  13. pattern IE Pattern Learning with Relevant Regions and Semantic Affinity [Patwardhan & Riloff, EMNLP-07] relevant & irrelevant texts Self-training SVM Classifier Semantic Affinity Pattern Learner Relevant Region Classifier Relevant Sentences IE Patterns IE System Extractions

  14. Learned Extraction Patterns

  15. CERATOPS Text Extraction and Data Visualization for Animal Health Surveillance • Collaborative project between CERATOPS, PURVAC, and the Veterinary Information Network (VIN), with funding from LLNL. • Goal: proof-of-concept of an end-to-end NLP-based visual analytics system for unstructured text.

  16. Animal Health Surveillance Monitoring animal health is important to DHS’ mission: • 73% of emerging infectious diseases are zoonotic in origin. • Pets can provide early warning signs of disease outbreaks and exposures to toxic substances. • Adverse pet reactions can be early indicators of food chain contamination.

  17. The Veterinary Information Network • VIN is the largest on-line community, information resource, and on-line continuing education source for veterinarians. Over half of all veterinarians in the U.S. use VIN! • VIN hosts message boards where veterinarians discuss what they are seeing in their practices. 15 years of message board data has been archived! • VIN built a database of semantic information associated with pet health to support search. • Paul Pion, DVM, President and co-founder of VIN, and served as our consultant.

  18. NLP fact fact fact… CERATOPS NLP-based Visual Analytics

  19. Prototype System for We produced a prototype IE system to extract and visualize diseases, victims, dates, and locations from ProMed-mail disease outbreak reports. • Used the VIN database (248,108 entries) to create 3 new dictionaries for text analysis: • syntactic and semantic lexicon • phrasal lexicon • synonym dictionary • Enhanced the template generation process to use new types of semantic information. • Converted our IE templates into a format appropriate for Purdue’s visualization system.

  20. ProMed-mail Visualization Output

  21. NLP-based Visual Analytics for Animal Health Surveillance • Rapid identification of new disease outbreaks. • Trends or spikes in disease outbreaks. • Unusual symptoms or clusters of symptoms. • Statistical associations between foods & adverse pet reactions. • Improved diagnostic tools to associate symptoms with diseases and external events. Future Goals:

  22. CERATOPS Semantic Class Learning from the Web [Kozareva, Riloff, & Hovy, ACL-08] • Goal: automatically create semantic dictionaries • Use a doubly-anchored hyponym pattern: <class name> such as <class member> and * • Construct pattern linkage graphs to capture the popularity and productivity of candidate terms and rank them. • Produces very accurate results with truly minimal supervision (class name and one seed)

  23. Semantic Class Learning Results

  24. Chain1: Chain2: Chain3: Chain4: U.S. State Dept. President Bush NIH Inspector General Coreference Resolution • Links entities, events, and opinions within and across documents

  25. Queen Elizabeth her [Queen Elizabeth], set about transforming [her] [husband] , [King George VI], … coref? coref? Clustering Algorithm coref? coref? husband King George VI coref? Build on Prior Work in NP Coreference Resolution • Classification • given a description of two noun phrases, NPiand NPj, classify the pair as coreferentor not coreferent • Clustering • coordinates pairwise coreference decisions E.g., Ng & Cardie ACL [2002]

  26. Partially Supervised Clustering for Source Coreference Resolution [Stoyanov & Cardie, EMNLP 2006] Labels for non-source NPs are unavailable Australian press has launched a bitter attack on Italy after seeing theirbelovedSocceroos eliminated on a controversial late penalty. ItaliancoachLippi has also been blasted for his comments after the game. In the opposite camp Lippi is preparing his side for the upcoming game with Ukraine. Hehailed 10-man Italy's determination to beat Australia and said the penalty was rightly given.

  27. State-of-the-Art Coreference Resolution • Cornell, Utah, & LLNL are collaboratively building a state-of-the-art coreference resolver based on the best features identified in prior work. • We plan to make the system publicly available. • On-going work and future plans include: • systematic evaluations of coreference subproblems • incorporating external knowledge about entities • non-anaphoric NP identification • unsupervised, automatic training • topic coreference for opinion analysis

  28. Extraction and Summarization of Opinions

  29. Source Attitude Target Negative Emotion Intensity: High Opinion Frame Source:Angolans Polarity:negative Attitude: emotion Intensity:high Target:Marburg virus Subjectivity: opinions, emotions, motivations, speculations, sentiments • Information Extraction of • NL expressions • Components • Properties Angolans are terrified of the Marburg virus

  30. Fine-grained Opinions Australian press has launched a bitter attack on Italy after seeing their beloved Socceroos eliminated on a controversial late penalty. Italiancoach Lippi has also been blasted for his comments after the game. In the opposite camp Lippi is preparing his side for the upcoming game with Ukraine. He hailed 10-man Italy's determination to beat Australia and said the penalty was rightly given. [Stoyanov & Cardie, 2006]

  31. Opinion Frame Source:Australian Press Polarity:negative Attitude: sentiment Intensity:high Target:Italy Fine-grained Opinion Extraction “The Australian Press launched a bitter attack on Italy”

  32. Socceroos Australian Press penalty Italy Marcello Lippi Opinion Summary

  33. Opinion Frame Source: Polarity: Intensity: Direct Subjective Source: Polarity: Intensity: Direct Subjective Source: Polarity: Intensity: Summary Representation Disease Outbreak Victim: Location: Disease: Date: … Summarization of Opinions + Events

  34. Why Opinions? • Provide technology that can aid analysts in their • extracting socio-behavioral information from text • monitoring public health awareness, knowledge and speculations about disease outbreaks, … • Enrich Information Extraction, Question Answering, and Visualization tools

  35. Opinion Frame Source: Polarity:negative Attitude: Intensity:high Target: E.g., are people extremely afraid or angry?

  36. Opinion Frame Source: Polarity: Attitude: Intensity: Target: The industry is scared and so, even if they do find an ornamental carp with KHV, they will keep it secret Recognize motivations Predict actions

  37. Opinion Frame Source: Polarity: Attitude: Intensity: Target: Ban on British beef Brugere-Picoux backs the decision to ban British Beef Search for opinions about particular named targets

  38. Opinion Frame Source: Brugere-Picoux Polarity: Attitude: Intensity: Target: Brugere-Picoux backs the decision to ban British Beef Search for opinions held by particular named sources

  39. Motivation for the Summaries • Quickly determine the opinions of a person, organization, community, region, etc. • Quickly determine the opinions toward a person, organization, issue, event, … • Across an entire document • Across multiple documents • Over time • Reveal relationships and identify cliques and communities of interest • Complement work in social network analysis

  40. Outline • Motivations for opinion extraction • Extracting opinion frames and components • Lexicon of subjective expressions • Contextual disambiguation • Enriched tasks • Opinion summarization

  41. Lexicon • Explore different uses of words, to zero in on the subjective ones • Example: benefit

  42. Lexicon • Example: benefit • Very often objective, as a Verb: Children with ADHD benefited from a 15-course of fish oil

  43. Lexicon • Noun uses look more promising: The innovative economic program has shown benefits to humanity

  44. Lexicon • However, there are objective noun uses too: …tax benefits. …employee benefits. …tax benefits to provide a stable economy. …health benefits to cut costs.

  45. Lexicon • Pattern:benefits as the head of a noun phrase containing a prepositional phrase • Matches this: The innovative economic program has shown proven benefits to humanity • But none of these: …tax benefits. …employee benefits. …tax benefits to provide a stable economy. …health benefits to cut costs.

  46. LexiconLonger Constructionsbe soft on crime <item index="1"> <itemMorphoSyntax> <lemma>be</lemma></itemMorphoSyntax> <itemRelation xsi:type="ngramPattern"> <distance>2</distance> <landmark>2</landmark></itemRelation></item> <item index="2"> <itemMorphoSyntax> <word>soft</word> <majorClass>J</majorClass></itemMorphoSyntax> <itemRelation xsi:type="ngramPattern"> <distance>1</distance> <landmark>3</landmark></itemRelation></item> <item index="3"> <itemMorphoSyntax> <word>on</word></itemMorphoSyntax> <itemRelation xsi:type="ngramPattern"> <distance>1</distance> <landmark>4</landmark></itemRelation></item> <item index="4"> <itemMorphoSyntax> <word>crime</word> <majorClass>N</majorClass> </itemMorphoSyntax>

  47. The entry contains a pattern for finding instances of the construction • Matches variations: • When I look into his past I see a man who is very soft on crime. • The data could also weaken her authority to criticize Patrick for being soft on crime.

  48. Attributive information <entryAttributes origin="j"> <name>be soft on crime</name> <subjective>true</subjective> <reliability>h</reliability> <confidence>h</confidence> <subType>sen</subType> <example>The Obama campaign rejected the notion that the senator might be vulnerable to accusations that he is soft on crime.</example> <morphosyn>vp</morphosyn> <target>s</sp_target> <polarity>n</polarity> <intensity>m</intensity> <confidence>h</confidence> <regex>1:[morph:[lemma="be"] order:[distance="2" landmark="2"]] 2:[morph:[word="soft" majorClass="J"] order:[distance="1" landmark="3"]] 3:[morph:[word="on"] order:[distance="1" landmark="4"]] 4:[morph:[word="crime" majorClass="N"]]</regex> <patterntype>ngramPattern</patterntype>

  49. Lexicon: Summary • Uniform representation for different types of subjectivity clues • Word stem: benefit • Word: benefits • Word/POS: benefits/nouns • Fixed n-grams: benefits to • Syntactic patterns • Combinations of the above • Learn subjective uses from corpora (bodies of texts) • Capture longer subjective constructions • Add relevant knowledge about expressions • Riloff, Wiebe, Wilson 2003; Riloff & Wiebe 2003; Wiebe & Riloff 2005; Riloff, Patwardhan, Wiebe 2006; Ruppenhofer, Akkaya, Wiebe in preparation

More Related