1.04k likes | 1.34k Views
Information Extraction. Shallow Processing Techniques for NLP Ling570 December 5, 2011. Roadmap. Information extraction: Definition & Motivation General approach Relation extraction techniques: Fully supervised Lightly supervised. Information Extraction. Motivation:
E N D
Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011
Roadmap • Information extraction: • Definition & Motivation • General approach • Relation extraction techniques: • Fully supervised • Lightly supervised
Information Extraction • Motivation: • Unstructured data hard to manage, assess, analyze
Information Extraction • Motivation: • Unstructured data hard to manage, assess, analyze • Many tools for manipulating structured data: • Databases: efficient search, agglomeration • Statistical analysis packages • Decision support systems
Information Extraction • Motivation: • Unstructured data hard to manage, assess, analyze • Many tools for manipulating structured data: • Databases: efficient search, agglomeration • Statistical analysis packages • Decision support systems • Goal: • Given unstructured information in texts • Convert to structured data • E.g., populate DBs
IE Example Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip to some cities also served by lower-cost carriers. American Airlines, a unit of AMR Corp. immediately matched the move, spokesman Time Wagner said. United, a unit of UAL Corp., said the increase took effect Thursday an applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Denver to San Francisco. • Lead airline: • Amount: • Effective Date: • Follower:
IE Example Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip to some cities also served by lower-cost carriers. American Airlines, a unit of AMR Corp. immediately matched the move, spokesman Time Wagner said. United, a unit of UAL Corp., said the increase took effect Thursday an applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Denver to San Francisco. • Lead airline: United Airlines • Amount: $6 • Effective Date: 2006-10-26 • Follower: American Airlines
Other Applications • Shared tasks: • Message Understanding Conferences (MUC)
Other Applications • Shared tasks: • Message Understanding Conferences (MUC) • Joint ventures/mergers:
Other Applications • Shared tasks: • Message Understanding Conferences (MUC) • Joint ventures/mergers: • CompanyA, CompanyB, CompanyC, amount, type • Terrorist incidents
Other Applications • Shared tasks: • Message Understanding Conferences (MUC) • Joint ventures/mergers: • CompanyA, CompanyB, CompanyC, amount, type • Terrorist incidents • Incident type, Actor, Victim, Date, Location • General:
Other Applications • Shared tasks: • Message Understanding Conferences (MUC) • Joint ventures/mergers: • CompanyA, CompanyB, CompanyC, amount, type • Terrorist incidents • Incident type, Actor, Victim, Date, Location • General: • Pricegrabber • Stock market analysis • Apartment finder
Information ExtractionComponents • Incorporates range of techniques, tools
Information ExtractionComponents • Incorporates range of techniques, tools • FSTs (pattern extraction) • Supervised learning: • Classification • Sequence labeling
Information ExtractionComponents • Incorporates range of techniques, tools • FSTs (pattern extraction) • Supervised learning: • Classification • Sequence labeling • Semi-, un-supervised learning: clustering, bootstrapping
Information ExtractionComponents • Incorporates range of techniques, tools • FSTs (pattern extraction) • Supervised learning: • Classification • Sequence labeling • Semi-, un-supervised learning: clustering, bootstrapping • Part-of-Speech tagging • Named Entity Recognition • Chunking • Parsing
Information Extraction Process • Typically pipeline of major components
Information Extraction Process • Typically pipeline of major components • First step: Named Entity Recognition (NER) • Often application-specific • Generic type: Person, Organiztion, Location • Specific: genes to college courses
Information Extraction Process • Typically pipeline of major components • First step: Named Entity Recognition (NER) • Often application-specific • Generic type: Person, Organiztion, Location • Specific: genes to college courses • Identify NE mentions: • specific instances
Example: Pre-NER • Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip to some cities also served by lower-cost carriers. American Airlines, a unit of AMR Corp. immediately matched the move, spokesman Time Wagner said. United, a unit of UAL Corp., said the increase took effect Thursday an applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Denver to San Francisco.
Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
Information Extraction:Reference Resolution • Builds on NER • Clusters entity mentions referring to same things • Creates entity equivalence classes
Information Extraction:Reference Resolution • Builds on NER • Clusters entity mentions referring to same things • Creates entity equivalence classes • Includes anaphora resolution • E.g. pronoun resolution (it, he, she…; one; this, that)
Information Extraction:Reference Resolution • Builds on NER • Clusters entity mentions referring to same things • Creates entity equivalence classes • Includes anaphora resolution • E.g. pronoun resolution (it, he, she…; one; this, that) • Also non-anaphoric, general mentions
Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday]it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text
Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text • Mix of application-specific and generic relations • Generic relations:
Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text • Mix of application-specific and generic relations • Generic relations: • Family, employment, part-whole, membership, geospatial
Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday]it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text • Mix of application-specific and generic relations • Generic relations: • Family, employment, part-whole, membership, geospatial • Generic relations: • United is part-of UAL Corp. • American Airlines is part-of AMR • Tim Wagner is an employee of American Airlines
Relation Detection & Classification • Relation detection & classification: • Find and classify relations among entities in text • Mix of application-specific and generic relations • Generic relations: • Family, employment, part-whole, membership, geospatial • Generic relations: • United is part-of UAL Corp. • American Airlines is part-of AMR • Tim Wagner is an employee of American Airlines • Domain-specific relations: • United serves Chicago; United serves Denver, etc
Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday]it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
Event Detection &Classification • Event detection and classification: • Find key events
Event Detection &Classification • Event detection and classification: • Find key events • Fare increase by United • Matching fare increase by American • Reporting of fare increases • How cued?
Event Detection &Classification • Event detection and classification: • Find key events • Fare increase by United • Matching fare increase by American • Reporting of fare increases • How cued? • Instances of “said” and ”cite”
Temporal Expressions:Recognition and Analysis • Temporal expression recognition • Finds temporal events in text
Example: post-NER • Citing high fuel prices, [ORGUnited Airlines] said [TimeFriday]it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
Temporal Expressions:Recognition and Analysis • Temporal expression recognition • Finds temporal events in text • E.g. Friday, Thursday in example text
Temporal Expressions:Recognition and Analysis • Temporal expression recognition • Finds temporal events in text • E.g. Friday, Thursday in example text • Others: • Date expressions: • Absolute: days of week, holidays • Relative: two days from now, next year • Clock times: • noon, 3:30pm
Temporal Expressions:Recognition and Analysis • Temporal expression recognition • Finds temporal events in text • E.g. Friday, Thursday in example text • Others: • Date expressions: • Absolute: days of week, holidays • Relative: two days from now, next year • Clock times: • noon, 3:30pm • Temporal analysis: • Map expressions to points/spans on timeline • Anchor: e.g. Friday, Thursday: tied to example dateline
Template Filling • Texts describe stereotypical situations in domain • Identify texts evoking template situations • Fill slots in templates based on text
Template Filling • Texts describe stereotypical situations in domain • Identify texts evoking template situations • Fill slots in templates based on text • In example: fare-raise is stereo-typical event
Information Extraction Steps • Named Entity Recognition • Reference Resolution • Relation Detection & Classification • Event Detection & Classification • Temporal Expression Recognition & Analysis • Template-filling
Supervised Approaches to Relation Analysis • Two stage process:
Supervised Approaches to Relation Analysis • Two stage process: • Relation detection: • Detect whether a relation holds between two entities
Supervised Approaches to Relation Analysis • Two stage process: • Relation detection: • Detect whether a relation holds between two entities • Positive examples: