1 / 38

Semantics and Information Extraction

Semantics and Information Extraction. Douglas E. Appelt Artificial Intelligence Center SRI International. What is Semantics?. Theory of the relationship between formal aspects of language and objects and facts in the world. Traditional Approach in NLP (and linguistics).

dea
Download Presentation

Semantics and Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantics and Information Extraction Douglas E. Appelt Artificial Intelligence Center SRI International

  2. What is Semantics? • Theory of the relationship between formal aspects of language and objects and facts in the world. Semantics and Info. Extraction

  3. Traditional Approach in NLP (and linguistics) • Define a well-behaved logical language • Intensional logic • Dynamic predicate logic • Discourse Representation Structures • Define a semantics for the logical language (using model theory) • Devise rules for translating natural language structures into the logical language that preserve truth conditions. • Apply principles of compositionality to build larger structures from smaller ones. Semantics and Info. Extraction

  4. Successes and Failures • Success • Data base query applications (e.g. ATIS systems) • Dialog systems with narrow domain of application (e.g. TRAINS) • Failures • Extracting information from large corpora • Real syntax too complex • Coverage too weak for large corpora Semantics and Info. Extraction

  5. Semantics and Information Extraction • General requirements of a semantic theory for information extraction • ACE as a specific approach to semantics for information extraction • Examine specific issues • Basic ontology • Coreference • Generic/Specific • Metonymy • Relations and Events Semantics and Info. Extraction

  6. Information Extraction:A Pragmatic Approach • Let application requirements drive semantic analysis • Identify the types of entities that are relevant to a particular task • Identify the range of facts that one is interested in for those entities • Ignore everything else Semantics and Info. Extraction

  7. MUC and Scenario Templates • Define a set of “interesting entities” • Persons, organizations, locations… • Define a complex scenario involving interesting events and relations over entities • Example: management succession: persons, companies, positions, reasons for succession • This collection of entities and relations is called a “scenario template.” Semantics and Info. Extraction

  8. Problems with Scenario Template • Encouraged development of highly domain specific ontologies, rule systems, heuristics, etc. • Most of the effort expended on building a scenario template system was not directly applicable to a different scenario template. Semantics and Info. Extraction

  9. Addressing the Problem • Address a large number of smaller, more focused scenario templates (Event-99) • Develop a more systematic ground-up approach to semantics by focusing on elementary entities, relations, and events (ACE) Semantics and Info. Extraction

  10. The ACE Program • “Automated Content Extraction” • Develop core information extraction technology by focusing on extracting specific semantic entities and relations over a very wide range of texts. • Corpora: Newswire and broadcast transcripts, but broad range of topics and genres. • Third person reports • Interviews • Editorials • Topics: foreign relations, significant events, human interest, sports, weather • Discourage highly domain- and genre-dependent solutions Semantics and Info. Extraction

  11. Components of a Semantic Model • Entities - Individuals in the world that are mentioned in a text • Simple entities: singular objects • Collective entities: sets of objects of the same type where the set isexplicitly mentioned in the text • Attributes - Timeless unary properties of entities (e.g. Name) • Temporal points and intervals • Relations - Properties that hold of two entities over a time interval • Events - A particular kind of relation among entities implying a change in relation state at the end of the time interval. Semantics and Info. Extraction

  12. Semantic Analysis: Relating Language to the Model • Linguistic Mention • A particular linguistic phrase • Denotes a particular entity, relation, or event • A noun phrase, name, or possessive pronoun • A verb, nominalization, compound nominal, or other linguistic construct relating other linguistic mentions • Linguistic Entity • Equivalence class of mentions with same meaning • Coreferring noun phrases • Relations and events derived from different mentions, but conveying the same meaning Semantics and Info. Extraction

  13. Language and World Model Linguistic Mention Denotes Denotes Linguistic Entity Semantics and Info. Extraction

  14. Recognition Type Classification Linguistic Mention Event Recognition Events and Relations Cross-Document Coreference Linguistic Entity Coreference NLP Tasks in an Extraction System Semantics and Info. Extraction

  15. The Basic Semantic Tasks of an IE System • Recognition of linguistic entities • Classification of linguistic entities into semantic types • Identification of coreference equivalence classes of linguistic entities • Identifying the actual individuals that are mentioned in an article • Associating linguistic entities with predefined individuals (e.g. a database, or knowledge base) • Forming equivalence classes of linguistic entities from different documents. Semantics and Info. Extraction

  16. Choosing an Ontology for IE Semantics • Ordinary native speakers should be able to annotate text with minimal training. • People should have well-developed intuitions about type classification • Is a “museum” an organization or facility? (A FOG?) • People should have well-developed intuitions about entity coreference • “Peace in the Middle East” • Entities should be extensional, not abstract, generic, counterfactual, or fictional Semantics and Info. Extraction

  17. The ACE Ontology and Annotation Standards • Documents available online • http://www.ldc.upenn.edu/Projects/ACE/ • Entity standards • Relations standards • Proposed event standards still under development Semantics and Info. Extraction

  18. The ACE Ontology • Persons • A natural kind, and hence self-evident • Organizations • Should have some persistent existence that transcends a mere set of individuals • Locations • Geographic places with no associated governments • Facilities • Objects from the domain of civil engineering • Geopolitical Entities • Geographic places with associated governments Semantics and Info. Extraction

  19. Why GPEs • An ontological problem: certain entities have attributes of physical objects in some contexts, organizations in some contexts, and collections of people in others • Sometimes it is difficult to impossible to determine which aspect is intentded • It appears that in some contexts, the same phrase plays different roles in different clauses Semantics and Info. Extraction

  20. Aspects of GPEs • Physical • San Francisco has a mild climate • Organization • The United States is seeking a solution to the North Korean problem. • Population • France makes a lot of good wine. Semantics and Info. Extraction

  21. Metonymy • Metonymy is when a speaker uses a mention to refer in a systematic way to an entity with a different name or type than that mentioned. • Metonymy is a property of mentions. • A “literal” mention is where the mention uses the name or type of the referential entity. • A “metonymic” mention violates that in some way. • A single entity can have both literal and metonymic mentions. Semantics and Info. Extraction

  22. Examples • Name metonymy • Beijing announced a new policy toward North Korea. • Baltimore hit a home run in the ninth inning • SRI was severely damaged in the 1989 earthquake • Type metonymy • John works for the restaurant on the corner Semantics and Info. Extraction

  23. Problem Cases: literal and metonymic mentions both not types of interest John bought a Picasso. It set him back $1 million. He is his favorite artist. Semantics and Info. Extraction

  24. Role AmbiguityWhy isn’t it just metonymy? • Iraq attacked Kuwait • Was the attack on the physical territory? • Was the attack on the government? • Was the attack on the people of Kuwait? • The answer is “yes”. Semantics and Info. Extraction

  25. Multiple Roles • Iraq disputed its border with Kuwait • Governments dispute things • Physical real estate has borders Semantics and Info. Extraction

  26. Role Classification andSparse Data Problem • Role determination through predicate-argument constraints • China announced a new policy regarding North Korea. • ACE Corpus: About 20K words in training corpus • GPE-PER: 84 configurations • GPE-LOC: 432 configurations • GPE-ORG: 504 configurations • GPE-GPE: 789 configurations • Only 131 configurations have more than 2 instances in the corpus (about 7%) • Many of those involve weakly constrained predicates (have, be, of, etc.) Semantics and Info. Extraction

  27. Generic vs Specific • The assumed application is building a database using extracted information • Databases typically represent concrete entities • Specificity is a critical attribute of linguistic entities. • Specificity is a property of the entity, not the mention: • John is looking for a Java programmer. • He must have three years of experience. • Problem: assessment of specificity is a nuanced distinction subject to substantial inter-annotater disagreement Semantics and Info. Extraction

  28. Types of Linguistic Mentions • Name mentions • The mention uses a proper name to refer to the entity • Nominal mentions • The mention is a noun phrase whose head is a common noun • Pronominal mentions • The mention is a headless noun phrase, or a noun phrase whose head is a pronoun, or a possessive pronoun Semantics and Info. Extraction

  29. Entity and Mention Example [COLOGNE, [Germany]] (AP) _ [A[Chilean]exile] has filed a complaint against [former[Chilean]dictator Gen. Augusto Pinochet] accusing [him] of responsibility for [her] arrest and torture in [Chile] in 1973, [prosecutors] said Tuesday. [The woman, [[a Chilean]who has since gained [German] citizenship]], accused [Pinochet] of depriving [her] of personal liberty and causing bodily harm during [her] arrest and torture. Person Organization Geopolitical Entity Semantics and Info. Extraction

  30. Relations • Relations hold between two entities over a time interval. • Relations may be “timeless” or temporal interval is not specified • Relations have inertia, I.e. they don’t change unless a relevant event happens. Semantics and Info. Extraction

  31. Explicit and Implicit Relations • Many relations are true in the world. Reasonable knoweldge bases used by extraction systems will include many of these relations. Semantic analysis requires focusing on certain ones that are directly motivated by the text. • Example: • Baltimore is in Maryland is in United States. • “Baltimore, MD” • Text mentions Baltimore and United States. Is there a relation between Baltimore and United States? Semantics and Info. Extraction

  32. Another Example • Prime Minister Tony Blair attempted to convince the British Parliament of the necessity of intervening in Iraq . • Is there a role relation specifying Tony Blair as prime minister of Britain? • A test: a relation is implicit in the text if the text provides convincing evidence that the relation actually holds. Semantics and Info. Extraction

  33. Explicit Relations • Explicit relations are expressed by certain surface linguistic forms • Copular predication - Clinton was the president. • Prepositional Phrase - The CEO of Microsoft… • Prenominal modification - The American envoy… • Possessive - Microsoft’s chief scientist… • SVO relations - Clinton arrived in Tel Aviv… • Nominalizations - Anan’s visit to Baghdad… • Apposition - Tony Blair, Britain’s prime minister… Semantics and Info. Extraction

  34. Types of ACE Relations • ROLE - relates a person to an organization or a geopolitical entity • Subtypes: member, owner, affiliate, client, citizen • PART - generalized containment • Subtypes: subsidiary, physical part-of, set membership • AT - permanent and transient locations • Subtypes: located, based-in, residence • SOC - social relations among persons • Subtypes: parent, sibling, spouse, grandparent, associate Semantics and Info. Extraction

  35. Event Types (preliminary) • Movement • Travel, visit, move, arrive, depart … • Transfer • Give, take, steal, buy, sell… • Creation/Discovery • Birth, make, discover, learn, invent… • Destruction • die, destroy, wound, kill, damage… Semantics and Info. Extraction

  36. Problem: Collective and Distributive Reference ……John………………………………………………. ………………….Bill…………………………………. …………………………they…………………………. There are at least three distinct entities in this text. Need a way to relate John and Bill entities to the collective mention, they. Semantics and Info. Extraction

  37. Solution: Relations ……John………………………………………………. ………………….Bill…………………………………. …………………………they…………………………. PartOf.part(e(John), e(they)) PartOf.part(e(Bill), e(they)) Three of the men… PartOf.part(e(three), e(the men)) Semantics and Info. Extraction

  38. Summary • Motivation for a semantic theory is a practical one driven by database filling needs • Pick a limited ontology of core concepts, and build out, motivated by application needs • Address a broad spectrum of semantic problems, but from a limited ontology that simplifies data annotation issues. Semantics and Info. Extraction

More Related