Introduction to “Event Extraction”

Introduction to “Event Extraction” Jan 18, 2007

What is “Information Extraction” As a task: Filling slots in a database from sub-segments of text. October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… IE NAME TITLE ORGANIZATION Bill GatesCEOMicrosoft Bill VeghteVPMicrosoft Richard StallmanfounderFree Soft..

What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification + clustering + association October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation Also known as “named entity extraction”

What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation also aka “named entity extraction”

What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification+ association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Free Soft.. Richard Stallman founder What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification+ association+ clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation * * * *

Some things to think about • We’ve seen sliding windows, non-sequential token tagging, and sequential token tagging. • Which of these are likely to work best, and when? • Are there other ways to formulate NER as a learning task? • Is there a benefit from using more complex graphical models? What potentially useful information does a linear-chain CRF not capture? • Can you combine sliding windows with a sequential model? • Next lecture will survey IE of sets of related entities (e.g., person and his/her affiliation). • How can you formalize that as a learning task? • Some case studies…

ACE: Automatic Content Extraction A case study, or: yet another NIST bake-off

About ACE • http://www.nist.gov/speech/tests/ace/ and http://projects.ldc.upenn.edu/ace/ • The five year mission: “develop technology to extract and characterize meaning in human language”…in newswire text, speech, and images • EDT: Develop NER for: people, organizations, geo-political entities (GPE), location, facility, vehicle, weapon, time, value … plus subtypes(e.g., educational organizations) • RDC: identify relation between entities: located, near, part-whole, membership, citizenship, … • EDC: identify events like interaction, movement, transfer, creation, destruction and their arguments

… and their arguments (entities)

Events, entities and mentions • In ACE there is a distinction between an entity—a thing that exists in the Real World—and an entity mention—which is something that exists in the text (a substring). • Likewise, and event is something that (will, might, or did) happen in the Real World, and an event mention is some text that refers to that event. • An event mention lives inside a sentence (the “extent”) • with a “trigger” (or anchor) • An event mention is defined by its type and subtype (e.g, Life:Marry, Transaction:TransferMoney) and its arguments • Every argument is an entity mention that has been assigned a role.

Events, entities and mentions • An event mention lives inside a sentence (the “extent”) • with a “trigger” (or anchor) • An event mention is defined by its type and subtype (e.g, Life:Marry, Transation:TransferMoney) and its arguments • Every argument is an entity mention that has been assigned a role. ITHACA, N.Y. -- The John D. and Catherine T. MacArthur Foundation today (Sept. 20) named Jon Kleinberg, Cornell professor of computer science, among the 25 new MacArthur Fellows -- the so-called "Genius Awards" -- for 2005. Dr. Kleinberg will receive $500,000 in no-strings-attached support over the next five years. Jon Kleinberg will receive $500,000 from the MacArthur Foundation over the next five years.

How to find events? The simple approach in Ahn, “Stages of event extraction”: • Find all the entities and sentences • Ahn uses ground-truth labels for entities (an ACE thing) • Entities=candidate event arguments; sentences=candidate event extents; these will be classified and paired up • Find the event mentions • Classify words as anchors (for event type T:S) or not: 35 classes, mostly None • Classify (event-mention,entity-mention) pairs as arguments (with role R) or not: 36 classes, mostly None • Q: Why not just classify entity-mentions by Role? • Classifyevent mentions by modality, polarity, … • Classify (event-mentioni,event-mentionj) pairs as co-referent or not. • Treat all of these tasks as separate classification problems • POS tag and parse everything, convert parse tree to dependency relations, and use all of these as features

Event Anchors - Features

Event Anchors – Results (MaxEnt and TIMBL)

Argument identification Function of both anchor and entity mention

Event co-reference …using greedy left-to-right clustering where you repeatedly decide “should I link new event mention Mnew with previous mention M1, M2, …. based on Pr(Mnew co-referent with Mj)

Ahn: The punchline

The Webmaster Project:Yet Another Case Study with Einat Minkov (LTI), Anthony Tomasic (ISRI) See IJCAI-2005 paper

What’s new: Adaptive NLP components Learn to adapt to changes in domain of discourse Deep analysis in limited but evolving domain Compared to past NLP systems: Deep analysis in narrow domain (Chat-80, SHRDLU,...) Shallow analysis in broad domain (POS taggers, NE recognizers, NP-chunkers, ...) Learning used as tool to develop non-adaptive NLP components Details: Assume DB-backed website, where schema changes over time No other changes allowed (yet) Interaction: User requests (via NL email) changes in factual content of website (assume update of one tuple) System analyzes request System presents preview page and editable form version of request Key points: partial correctness is useful user can verify correctness (vs case for DB queries, q/a,...) => source of training data ...something in between... Overview and Motivations

LEARNER offline training data Shallow NLP Feature Building Classification C requestType POS tags email msg NP chunks targetRelation C words, ... targetAttrib C features NER newEntity1,... oldEntity1,... entity1, entity2, .... C keyEntity1,... otherEntity1,... preview page user-editable form version of request Update Request Construction User confirm? web page templates database

Outline • Training data/corpus • look at feasibility of learning the components that need to be adaptive, using a static corpus • Analysis steps: • request type • entity recognition • role-based entity classification • target relation finding • target attribute finding • [request building] • Conclusions/summary

On the staff page, change Mike to Michael in the listing for “Mike Roberts”. Training data Mike Roborts should be Micheal Roberts in the staff listing, pls fix it. Thanks - W User1 User2 User3 ....

Training data Add this as Greg Johnson’s phone number: 412 281 2000 User1 Please add “412-281-2000” to greg johnson’s listing on the staff page. User2 User3 ....

Training data – entity names are made distinct Add this as Greg Johnson’s phone number: 412 281 2000 User1 Please add “543-341-8999” to fred flintstone’s listing on the staff page. User2 User3 Modification: to make entity-extraction reasonable, remove duplicate entities by replacing them with alternatives (preserving case, typos, etc) ....

Training data message(user 1,req 1) User1 Request1 message(user 2,req 1) .... message(user 1,req 2) User2 Request2 message(user 2,req 2) .... User3 Request3 message(user 1,req 3) message(user 2,req 3) .... .... ....

Training data – always test on a novel user? message(user 1,req 1) test User1 Request1 message(user 2,req 1) .... message(user 1,req 2) train User2 Request2 message(user 2,req 2) .... User3 Request3 message(user 1,req 3) train message(user 2,req 3) .... Simulate a distribution of many users (harder to learn) .... ....

Training data – always test on a novel request? message(user 1,req 1) train User1 Request1 message(user 2,req 1) .... message(user 1,req 2) test User2 Request2 message(user 2,req 2) .... User3 Request3 message(user 1,req 3) train message(user 2,req 3) Simulate a distribution of many requests (much harder to learn) .... 617 emails total + 96 similar ones

Training data – limitations • One DB schema, one off-line dataset • May differ from data collected on-line • So, no claims made for tasks where data will be substantially different (i.e., entity recognition) • No claims made about incremental learning/transfer • All learning problems considered separate • One step of request-building is trivial for the schema considered: • Given entity E and relation R, to which attribute of R does E correspond? • So, we assume this mapping is trivial (general case requires another entity classifier)

Shallow NLP Feature Building C requestType POS tags email msg NP chunks targetRelation C words, ... targetAttrib C features Information Extraction newEntity1,... oldEntity1,... entity1, entity2, .... C keyEntity1,... otherEntity1,...

Entity Extraction Results • We assume a fixed set of entity types • no adaptivity needed (unclear if data can be collected) • Evaluated: • hand-coded rules (approx cascaded FST in “Mixup” language) • learned classifiers with standard feature set and also a “tuned” feature set, which Einat tweaked • results are in F1 (harmonic avg of recall and precision) • two learning methods, both based on “token tagging” • Conditional Random Fields (CRF) • Voted-perception discriminative training for an HMM (VP-HMM)

Entity Extraction Results – v2 (CV on users)

Entity Classification Results • Entity “roles”: • keyEntity: value used to retrieve a tuple that will be updated (“delete greg’s phone number”) • newEntity: value to be added to database (“William’s new office # is 5307 WH”). • oldEntity: value to be overwritten or deleted (“change mike to Michael in the listing for ...”) • irrelevantEntity: not needed to build the request (“please add .... – thanks, William”) • Features: • closest preceding preposition • closest preceding “action verb” (add, change, delete, remove, ...) • closest preceding word which is a preposition, action verb, or determiner (in “determined” NP) • is entity followed by ‘s

Shallow NLP Feature Building C requestType POS tags email msg NP chunks targetRelation C words, ... targetAttrib C features Information Extraction newEntity1,... oldEntity1,... entity1, entity2, .... C keyEntity1,... otherEntity1,... Reasonable results with “bag of words” features.

Can be determined from entity roles, except for deleteTuple and deleteValue. “Delete the phone # for Scott” vs “Delete the row for Scott” Features: counts of each entity role action verbs nouns in NPs which are (probably) objects of action verb (optionally) same nouns, tagged with a dictionary Target attributes are similar Request type classification:addTuple, alterValue, deleteTuple, or deleteValue? • Comments: • Very little data is available • Twelve words of schema-specific knowledge: dictionary of terms like phone, extension, room, office, ...

Shallow NLP Feature Building C requestType POS tags email msg NP chunks targetRelation C words, ... C targetAttrib features Information Extraction newEntity1,... oldEntity1,... entity1, entity2, .... C keyEntity1,... otherEntity1,...

Training data message(user 1,req 1) User1 Request1 message(user 2,req 1) .... message(user 1,req 2) User2 Request2 message(user 2,req 2) .... User3 Request3 message(user 1,req 3) message(user 2,req 3) .... .... ....

Training data – always test on a novel user? message(user 1,req 1) test User1 Request1 message(user 2,req 1) .... message(user 1,req 2) train User2 Request2 message(user 2,req 2) .... User3 Request3 message(user 1,req 3) train message(user 2,req 3) .... Simulate a distribution of many users (harder to learn) .... ....

Training data – always test on a novel request? message(user 1,req 1) train User1 Request1 message(user 2,req 1) .... message(user 1,req 2) test User2 Request2 message(user 2,req 2) .... User3 Request3 message(user 1,req 3) train message(user 2,req 3) Simulate a distribution of many requests (much harder to learn) .... 617 emails total + 96 similar ones

Introduction to “Event Extraction”