590 likes | 607 Views
Building Ontologies Automatically Theory and Demonstration. Dan Moldovan Human Language Technology Research Institute University of Texas at Dallas. Outline. Introduction to Ontologies Automatic Ontology Building Applications OWL/RDF Representation Jaguar-Jager Demo CHiPS Demo. Ontology.
E N D
Building Ontologies AutomaticallyTheory and Demonstration Dan Moldovan Human Language Technology Research Institute University of Texas at Dallas
Outline • Introduction to Ontologies • Automatic Ontology Building • Applications • OWL/RDF Representation • Jaguar-Jager Demo • CHiPS Demo ABBYY - 2012
Ontology • An ontology is an organization of concepts and semantic relations within a given domain • Ontologies explicitly represent knowledge about domains of interest; i.e. what concepts are important and how do they relate to each other • Ontologies serve as the backbone of semantic technologies and applications • Ontologies can help users achieve an unified understanding of concepts • Ontologies facilitate dealing with acronyms • Ontologies can be used as interchange formats to enable common access to data ABBYY - 2012
Ontology • Ontologies facilitate exchange of knowledge between machines and between people and machines • Ontologies allow easier visualization of documents; i.e. which concepts are important and how far semantically they are • Once an ontology is created, it can be used to tag new texts to enable better retrieval and further processing [this is the idea of the semantic web] • Ontologies help browsing, searching and question answering; it is possible to understand questions and provide semantic connections between question concepts and text words ABBYY - 2012
Ontologies for Question Answering • QP: determine the expected answer type and select the keywords used to retrieve relevant passages • Question classification • Answer type detection • PR: retrieve and rank passages that are relevant to the input question • Query formulation • Keyword expansion • AP: extract an exact answer by evaluating all answer candidates • Answer surface form • Answer redundancy ABBYY - 2012
Manual ontology creation Time consuming Error prone Requires subject matter experts The end product is difficult to maintain Hard to cope with the rapidly changing and vast amount of information available for a domain Automatic/Semi-automatic ontology generation Leverage existing domain models to seed the process of extracting semantically rich ontologies from unstructured text Automatically update the ontology when new documents are made available or the domain model changes Communicate ontology content across multiple applications using OWL/RDF as the common interchange format Allow the user to easily review, update, and maintain the ontology Customize ontology relations using semantic calculus and/or user defined rules How to Create an Ontology? ABBYY - 2012
Ontologies for Question Answering • QA system integrated with an automatic ontology building system ABBYY - 2012
Outline • Introduction to Ontologies • Ontologies for Question Answering • Automatic Ontology Building • Applications • OWL/RDF Representation • Jaguar- Jager Demo • CHiPS Demo ABBYY - 2012
Knowledge Acquisition from Text • KAT: automatically builds ontologies and knowledge bases (KBs) from concepts and semantic relationships found in text • Constituents of an ontology/KB • Concepts/Vocabulary • Key domain concepts (often missing from general purpose machine-readable dictionaries, e.g., WordNet) • “weapon”, “WMD”, “launcher” • Relationsbetween ontological concepts • “anthrax” ISA “biological weapon”, “anthrax” CAUSE “death” • Organization of Relations • Hierarchical (universally true transitive relations, e.g. ISA, PART-WHOLE) • Contextual(text-conveying relations identified by a semantic parser) ABBYY - 2012
Universal (or ontological) Represented in hierarchies Simple binary relations between concepts “Chemical weapons such as nerve gas, …” Contextual Represented in individual (semantic) contexts Groups of relations centered on a common concept “The forces launched a full-scale attack on Monday” Types of Knowledge ABBYY - 2012
Knowledge Base Constituents ABBYY - 2012
Knowledge Acquisition from Text • Functionality • Produce ontologies • Link concepts and relations to text • Visualize ontology • Edit ontology • Enhance an existing ontology • Merge two ontologies into a consistent ontology ABBYY - 2012
Automatically Building Ontologies • Ontology/KB creation • Knowledge extraction from text • Pattern recognition; semantic parsing • Knowledge representation and storage • Contextual vs. universal • XML; relational database • Knowledge base maintenance • Conflict resolution • Ontology mapping; ontology merging • User interaction; ontology modification ABBYY - 2012
KAT Modules – Text Processing • Input: Documents, Seeds • Extract “concepts” of interest • Extract binary relations (universal) • Use semantic parser to obtain contextual knowledge • Output: Concepts, Contexts, Binary Relations • The rebels had access to chemical weapons, such as nerve gas and other poisonous gases. ABBYY - 2012
Text Processing • Candidate concepts: NPs that contain seed concepts (e.g., <modifier> <seed_word>) and NPs semantically linked to seed concepts • Concept selection: discard candidates that match certain criteria( e.g. <modifier_descriptive_adjective> <seed_word> • Seed enrichment: enhance the current set of seeds with Step 2’s domain concepts and return to Step 1 • Relation selection: collect all semantic relations that link domain concepts with other concepts (in- or out-of-the- domain). The relations between domain concepts will become part of the ontology. ABBYY - 2012
Semantic Relations Stored in KB ABBYY - 2012
Semantic Relations Stored in KB ABBYY - 2012
Examples of Semantic Relations in text Semantic Relations are the interconnections between words or concepts that define the meaning of text. They are used as elements of knowledge bases. Example: John went to the park yesterday because he saw hot air balloons taking off from there Agent At-Time At-Location John went to the park yesterday because Cause Part - Whole ISA Value he saw balloons taking off from there hot air At-Location Stimulus Experiencer Experience ABBYY - 2012
Semantic Parser • Various syntactic patterns: verb-argument, complex nominals, genitives, adjectival phrases/clauses, etc. • Semantic restrictions on relation arguments R(x,y) • Domain and range restrictions defined using an ontology of sorts • KINSHIP: [AnimateConcreteObject] [AnimateConcreteObject] • Filter relations that cannot exist between certain arguments ABBYY - 2012
Semantic Parser • Bracketer – determine semantic dependencies between compound nouns with three or more nouns • Sugar industry analyst vs. Female industry analyst • Argument detection – identify argument pairs likely to encode a semantic relation based on lexico-syntactic patterns • Domain and range filtering – filter candidate arguments based on their semantic classes and relation definitions • Feature extraction – extract features corresponding to each pattern • Semantic class of modifier noun, syntactic path, voice, etc. • Machine learning classifiers – per-relation and per-pattern approaches • Support vector machines, Decision trees, Naïve Bayes, Semantic Scattering • Conflict resolution – resolve relation conflicts between classifiers ABBYY - 2012
KAT Modules – Classification/Hierarchy Creation • Input: Concepts, Binary Relations • Classify each concept against every other using defined procedures, obtaining set of ISA relations • Add all ISA and other binary relations to the hierarchy using conflict resolution • Output: Hierarchy of relations • “Scud missile” ISA “missile” • “Iraqi standing_army” ISA “Asian army” • “weapons inspection team” ISA “inspection team” ABBYY - 2012
Subsumption used for Knowledge Classification Proposition Let C = A1 ⊓ ⋯⊓ Am ⊓ ∀R1.C1 ⊓ ⋯ ⊓ ∀Rn.Cn be the normal form of theconcept description C, and D = B1 ⊓ ⋯ ⊓ Bk ⊓ ∀S1.D1 ⊓ ⋯ ⊓ ∀Sl.Dl be the normal form concept description D. Then C ⊑D iff both conditions hold. • For all i, 1 ≤ i ≤ k, there exists j, 1 ≤ j ≤ m such that Bi= Aj • For all i, 1 ≤ i ≤ l, there existsj, 1 ≤ j ≤ n such that Si = Rj and Cj ⊑Di This formulation of subsumption is • Sound (the “if” part holds) • Complete (the “only if” part holds) Algorithm has a polynomial complexity. ABBYY - 2012
Classification/Hierarchy Creation • Classification procedures • For domain concepts modifier1 head1 and modifier2 head2, create • If ISA(modifier1,modifier2) and ISA(head1,head2), then ISA(modifier1 head1, modifier2 head2) • Japan discount rate ISA Asian country interest rate • If ISA(modifier1,modifier2) and SYNONYMY(head1,head2), then ISA(modifier1 head1, modifier2 head2) • Japan discount rate ISA Asian country discount rate • If SYNONYMY(modifier1,modifier2) and ISA(head1,head2), then ISA(modifier1 head1, modifier2 head2) • Japan discount rate ISA Japan interest rate • If SYNONYMY(modifier1,modifier2) and SYNONYMY(head1,head2), then SYNONYMY(modifier1 head1, modifier2 head2) ABBYY - 2012
Classification/Hierarchy Creation • Classification procedures • For domain concepts modifier head and head, create ISA(modifier head, head) relation • nontaxable dividends ISA dividends • For domain concepts modifier1 modifier2 head, create • If modifier1 head exists, then ISA(modifier1 modifier2 head, modifier1 head) • nuclear weapon testing ISA nuclear testing • If modifier2 head exists, then ISA(modifier1 modifier2 head, modifier2 head) • nuclear weapon testing ISA weapon testing ABBYY - 2012
Classification/Hierarchy Creation • Textual entailment for concept subsumption • monetary policy ? fiscal policy ISA economic policy ISA policy (WordNet hierarchy) ABBYY - 2012
Domain Ontology/KB Creation - Example ABBYY - 2012
Domain Ontology/KB Creation - Example ABBYY - 2012
“Our Balancing Act” • Quantity • Making sure that the available information is actually extracted • Beauty • Making sure that the ontology concepts are real concepts, not just sentence fragments • Relevance • Not including every concept mentioned in a sentence ABBYY - 2012
“Striking the Balance” • Tuning text exploration aggressiveness • Pruning sentence phrases down to the “real concept” • Filtering out “ugly” sentence fragments • Handling conjunctions • “Tom and Bill” went to “Dallas and Fort Worth” • “Hank or Susan” went to “Chicago or New York” ABBYY - 2012
Ontology - Example • International Economics Ontology • Document collection: International Economics Book • 2.8 MB of plain text • Seed ontology: economics reference taxonomy • 558 seed concepts, e.g. aggregate demand, ATC curve, budget deficit, commodity money, etc. • 791 semantic relations • 5,678 ontological concepts • 13,878 semantic relations • AGENT, CAUSE, INFLUENCE, INSTRUMENT, ISA, AT-LOCATION, MAKE-PRODUCE, MANNER, PROPERTY, PURPOSE, PART-WHOLE, QUANTITY, SYNONYMY, THEME, AT-TIME, VALUE ABBYY - 2012
KAT Modules – Knowledge Base Maintenance • Knowledge base merging • Visualization • Knowledge base editing • User interaction • Modifications ABBYY - 2012
Knowledge Base Maintenance • New concept integration: concepts and relations extracted from incoming documents are added to the existing ontology • Establish a mapping between the new set of concepts/relations and the existing ontology • Add non-mapped concepts and relations to the ontology • Ontology mapping: identify a set of rules that link concepts from one ontology to analogous concepts (in another ontology) • Calculate semantic similarity of concepts • Similarity between the semantic models of concepts • Degree of textual entailment between the concepts’ glosses • Concept label-based similarity • Calculate semantic similarity of relations • Function of their arguments’ similarity degree ABBYY - 2012
Knowledge Base Maintenance • Ontology merging: create a new ontology by combining information from two or more ontologies • Map the ontologies (two at a time) • Combine domain concepts (use a single copy for mapped concepts) • Merge the relation sets of mapped concepts • Conflict resolution algorithm • Re-classify the new set of ontological concepts • Classification/hierarchy creation procedures ABBYY - 2012
Conflict Resolution • Approach used – prevention • Start from an empty hierarchy and an input relation set • Add a relation from the input set to the hierarchy, if • It does not form a cycle • It is not redundant (does not duplicate a path) • Remove jump links • Properties of hierarchical relations • Transitive • If R(A,B) and R(B,C), then R(A,C) • ISA(cat,mammal) and ISA(mammal,animal) ISA(cat,animal) • Strictly non-symmetric • If R(A,B), then NOT R(B,A) • ISA(cat,mammal) ¬ISA(mammal,cat) ABBYY - 2012
Inconsistencies Simple loops Cycles Redundancies Duplicate relations Jump links Types of Conflict ABBYY - 2012
Jump Links • Multiple paths from one node to another are acceptable • As long as no single link duplicates a path • Jump link removal • When it is safe to add R(A,B), remove links from direct descendents of B to B, if they have a path to A ABBYY - 2012
Do fewer links mean fewer knowledge? • Number of links: 4 • Assertions • a b • a c • b d • c d • a d • Number of links: 3 • Assertions • a b • b c • c d • a c • b d • a d ABBYY - 2012
Ontology Merging - Example ABBYY - 2012
Domain Ontology/KB Evaluation • Compare KAT’s automatically generated ontologies against gold annotations • Evaluation focuses on • Lexical level • Vocabulary/data layer level • Other semantic relations level • Viewing an ontology as a set of semantic relations between two concepts, the human annotators: • Labeled an entry correctif the concepts and the semantic relation are correctly detected by the system, else marked the entry as incorrect • Labeled a correctentry as irrelevantif any of the concepts or the semantic relation are irrelevant to the domain • Added new entries for concepts and semantic relations omitted by KAT (from input documents) ABBYY - 2012
Ontology/KB Evaluation - Metrics • NK(*) gives the counts from KAT’s output • NG(*) correspond to counts from gold annotations ABBYY - 2012
Domain Ontology/KB Evaluation - Results ABBYY - 2012
Jager™: Ontology Visualization and Editing • Web application - scalable, multi-user visualization and editing of KAT’s ontologies/KBs • Based on the Django framework and written in a mix of Python, HTML and Javascript • Jager (pronounced yeager) is a corruption of the German word Jäger (hunter) • Capabilities • Jager admin tool • Import/Export/Delete/Trim ontology • Compare two ontologies • Edit ontology name • For a given ontology • Edit/Delete/Insert concept/semantic relation ABBYY - 2012
Jager™: Ontology Visualization and Editing ABBYY - 2012
Outline • Introduction to Ontologies • Ontologies for Question Answering • Automatic Ontology Building • Applications • OWL/RDF Representation • Jaguar – Jager Demo • CHiPS Demo ABBYY - 2012
Collaborative High Precision Search • CHiPS™: ontology-guided search • More powerful than keyword search • Search from the perspective of a given ontology • Document matching • Semantic profiles are generated for documents based on a given ontology • Ontology concepts are identified in the text • Each identified concept is assigned a weight • Semantic profile matching • Semantic profiles for each document in a repository are generated in advance • Semantic profile for input search text is generated on the fly • Search algorithm finds a list of repository documents whose profiles most closely match that of the input search text profile ABBYY - 2012
CHiPS™ Architecture ABBYY - 2012
Document Similarity • Possible applications in medical domain • For diagnosis – patient data vs medical knowledge • For research – text snippet vs Medline • Match decision rules to KB • Others • Approaches • Statistical approaches: Latent Dirichlet Allocation, Pachinko Allocation, others • Semantic approaches: • Event based • Ontology based – outlined here • Others ABBYY - 2012
Sample Search • Search: The patient’s eye pain was associated with the surgical procedure and poly-L-lactic acid • Result: She describes this area as looking like a "bug bite" & was located "on top of" (above) gortex implant, near the lateral canthus. Its shape is round about one-fourth inch in diameter w/a rise w/a peak "maybe" one-eighth of an inch in height total. She said her phys has treated the "bug bite" area w/an unknown type of steroid injection, w/o effect. He now wants to remove this surgically, however, she is not certain if she wants this done. She noted that she did not massage for first week, as had no instruction to do so; she also had lid lift surgery at the time (of the face lift,) & surgeon did not want any pressure on surgical site. She reported her concomitant medications as estradiol, gabapentin (neurontin), for trigeminal neuralgia & facial non-specific neuralgia; also a multivitamin. Add'l medical history included trigeminal neuralgia & facial non-specific neuralgia both following the accident. No further medical info reported. Add'l info for sculptra from ptc report case (b)(4) dated (b)(6)2008, received by (b)(6) on 25mar08: b/c no lot # is available, an investigation has been performed on the documentation of all potentially involved manufactured batches. The review of the device history reports & of the analytical results of these batches did not show any anomaly that could be related to the event which occurred. • Repository: Manufacturer and User Facility Device Experience (MAUDE) ABBYY - 2012
Medical Subject Headings (MeSH) controlled vocabulary Encyclopedic knowledge Sample Search – Supporting Ontologies ABBYY - 2012
CHiPS™ Demo • Hybrid MeSH-MedRA ontology • NIH Medical Subject Headings (MeSH) taxonomy • http://www.nlm.nih.gov/mesh/ • Medical Dictionary for Regulatory Activities (MedRA) • http://www.meddramsso.com/ • 29,302 concepts • 38,828 semantic relations (ISA) • Document repositories • FDA MAUDE document repository • Manufacturer And User facility Device Experience • Database of adverse medical events • http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm • NIH MEDLINE document repository • journal citations and abstracts for biomedical literature from around the world • http://www.nlm.nih.gov/bsd/pmresources.html ABBYY - 2012