Leveraging Data and Structure in Ontology Integration

Leveraging Data and Structure in Ontology Integration Octavian Udrea1 Lise Getoor1 Renée J. Miller2 1University of Maryland College Park 2University of Toronto

Contents • Motivation and goals • Short overview of OWL Lite • The ILIADS method • Experimental evaluation

ILIADS • Goal: • Produce high-quality integration via a flexible method able to adapt to a wide variety of ontology sizes and structures • Method: • Combining statistical and logical inference • Use schema (structure) and data (instances) effectively • Solution: • Integrated Learning In Alignment of Data and Schema (ILIADS)

Contributions • Show how to combine statistical and logical inference effectively • Show that a small amount of inference yields high qualitative gain • Show that parameters needed to perform inference over data and structure are robust • Provide a thorough evaluation on 30 pairs of real-world ontologies (with ground truth)

Example OWL Lite ontologies (discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

Example OWL Lite ontologies • Anentitycan be a: • Class (discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

Example OWL Lite ontologies • Anentitycan be a: • Class • Instance (discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

Example OWL Lite ontologies • Anentitycan be a: • Class • Instance • Property (discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

Example OWL Lite ontologies (discoveredBy, owl:inverseOf, discoverer) (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer) (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

Inference in OWL Lite

The integration problem

State of the art • Robust statistical methods • Well-known similarity measures • Used for matching data (entities) and schema • May use graph structure of schema • Logical inference • Not combined with statistical inference • Basis for most schema mapping and ontology integration methods • Approaches integrate schema, but not data

Issues • How to combine statistical inference with logical inference • Takes into account data, structure, etc. so it’s no longer obvious • In particular, how to quantify the results of logical inference into a similarity-like form? • How to do logical inference in a tractable manner • For OWL-Lite, EXPTIME-complete for the worst case for the entire ontology

The ILIADS algorithm repeat until no more candidates • Compute local similarities • Select promising candidates • For each candidate • Select relationship • Perform logical inference • Update score with the inference similarity • Select the candidate with the best score end

Computing local similarities • simlexical: Jaro-Winkler and Wordnet • simstructural: Jaccard for neighborhoods • simextensional: Jaccard on extensions • parameters: λx, λs, λe • different for classes, instances and properties

Selecting promising candidates • Select candidates with sim(e,e’) > λt • Use a policy based on entity type to order, e.g.: • Class alignments first • Instance alignments first • Alternate between classes and instances

Selecting relationship • Must decide on relation type • subClassOf vs. equivalentClass • subPropertyOf vs. equivalentProperty • Determination is difficult, especially under the OWL open-world semantics • Use a simple extension based technique based on a threshold λr

Selecting relationship

The ILIADS algorithm repeat until no more candidates • Compute local similarities • Select promising candidates • For each candidate • Represent candidate relationship • Perform logical inference • Update score with the inference similarity • Select the candidate with the best score end

Performing logical inference For the candidate pair (e,e’): • Select an axiom to apply • The logical consequences are the pairs of entities (e(i), e(j)) that have just become equivalent • Repeat a small number of times (5) to maintain tractability

Performing logical inference

Performing logical inference (TheodorEscherich, owl:sameAs, T.S. Escherich) is a logical consequence of the candidate (E-ColiPoisoning, owl:sameAs, E-Coli)

The ILIADS algorithm repeat until no more candidates • Compute local similarities • Select promising candidates • For each candidate • Represent candidate relationship • Perform logical inference • Update score with the inference similarity • Select the candidate with the best score end

Updating score For the candidate pair (e,e’): • Initial local similarity sim(e,e’) • Inference similarity over all consequences: • Updated similarity:

Updating score

Consistency • The constructed alignment is not guaranteed to be consistent • ILIADS can only detect inconsistencies that appear in the few logical inference steps • Pellet used to check consistency after ILIADS • Experimentally, inconsistent ontologies in less than .5% of runs

Experimental framework • 30 pairs of real-world ontologies • From 194 to over 20,000 triples • From a variety of domains: medical, geographical, economical, biological • Ground truth provided by human reviewers • Multiple iterations to ensure the best human-provided alignment • Datasets available: • http://www.cs.umd.edu/linqs/projects/iliads

Experimental framework • Evaluation: precision, recall and F1 quality • F1 = 2 * Precision * Recall / (Precision + Recall) • 7 independent runs • ILIADS Variations: • ILIADS-tailored uses the best set of parameters for each pair of ontologies • ILIADS-fixed uses one set of parameters for all pairs of ontologies • Used to evaluate robustness of the parameters

Experimental framework • ILIADS compared to two leading tools: • FCA-merge [Stumme and Maedche, IJCAI 2001] • uses formal concept analysis and an external document corpus • COMA++ [Aumueller et al., SIGMOD 2005] • implements multiple match strategies, including fragment and reuse-based matching

Precision/recall

Precision/recall comparison

Precision/recall for ontologies with substantial instance data

Leveraging Data and Structure in Ontology Integration