Ontology Mapping and Link Discovery Framework

Ontology Mapping and link discovery Kunal Narsinghani Ashwini Lahane

Agenda • Introduction • Levels of heterogeneity • Previous work in the field • PROMPT Suite of Tools • Prompt on Protégé • The Web of Data • CRS : Managing Co-references • Silk – A link discovery framework

Introduction • Can a single ontology suffice for various applications? • Definition – The task of relating the vocabulary of two Ontologies that share the same domain of discourse • It’s a morphism that consists of a collection of functions assigning symbols used in one vocabulary to the symbols in the other[1] • This would provide a common layer from which ontologies can be accessed and exchange information. • Translation is different from mapping

Introduction • An analogy to the problem – Clocks Levels of Heterogeneity in Ontologies • Syntactic • Structural • Semantic

Mapping discovery • First approach is to use a reference ontology • Example – the upper Ontologies SUMO and DOLCE • What when a shared ontology is not available? • Structural & definitional information can be used to discover mappings • Example tools – IF-Map, QOM, MAFRA & Prompt

IF-MAP architecture Fig: The steps in IF-MAP

PROMPT Suite of Tools • Interactive tools for ontology merging and mapping • Ontology • formal specification of domain information • facilitate knowledge sharing and reuse • Different ontologies –may overlap, need to be reconciled • Determine correlation • Find all concepts • Determine similarities • Change source ontologies or remove overlap • Record mapping for future reference

Ontology Management • Tasks • Finding correlations • Merging ontologies • Version management • Factoring ontologies • Tools • Benefit from being tightly integrated into single framework • Uniform user interface • Same interaction paradigms • Easy access from one tool to another

PROMPT Knowledge Model • Based on knowledge model of Protégé • Frame based • Types of frames • Class • Set of entities specifying a concept • Slots • Attributes of class • Has domain and range • Must have unique names • Instances • Elements of class

PROMPT Framework • Tools for multiple-ontology management • Extension to Protege ontology-editing environment • Open architecture allows easy extension with plugins • Tools in PROMPT • IPROMPT – Interactive ontology merging tool • ANCHORPROMPT – a graph-based tool for finding similarities between ontologies • PROMPTDIFF –for finding a diff between two versions of the same ontology • PROMPTFACTOR – a tool for extracting a part of an ontology

PROMPT Framework

IPROMPT • Interactive ontology merging tool • Leads user through merging process • Suggestions for merging • Identifies inconsistencies and potential problems • Suggests strategies for resolving • Uses structure of concepts and their relation along with user input • Decision based on local context • Iterative

IPROMPT Algorithm

IPROMPT Algorithm • Creates initial suggestion based on lexical similarity of names • Merged ontology contains frames which are similar to frames in input ontologies • 2 ontologies O1 and O2 are merged to form Om • Merging decisions are designer and task dependent • Set of knowledge based operations defined • For each operation: • Changes performed automatically • New merging suggestions • Inconsistencies and potential problems

Class hierarchies

Suggestion for merging

IPROMPT Operations • Merge classes • Merge slots • Merge instances • Shallow copy of a class • Copy class from source ontology to merged • Deep copy of a class • Also copies all the parents of the class up to the root hierarchy

Inconsistencies & Potential Problems • Name conflicts • Dangling references • Redundancy in the class hierarchy • Slot values violating slot-value restrictions

Additional features • Setting up preferred ontology • Maintaining user focus • Providing feedback to user • Logging of ontology merging and editing operations

ANCHORPROMPT • Graph based tool for finding similarities • Compares larger portions • Goal : Augment IPROMPT by determining additional points of similarity • Input : Anchors - Set of pairs of related terms • Anchor identification – Manual /Automatic • Each ontology is viewed as a directed labeled graph

ANCHORPROMPT representation

ANCHORPROMPT algorithm

Algorithm • Begins with anchor pair • TRIAL, Trail • PERSON, Person • Path 1: TRIAL -> PROTOCOL -> STUDY-SITE -> PERSON • Path 2: Trial -> Design -> Blinding -> Person • Determine similarity score for pair of related terms • If two pairs of terms from the source ontologies are similar and there are paths connecting the terms, then the elements in those paths are often similar as well

PROMPTDIFF • Tool for comparing ontology versions • Version comparison in software code is based on comparing text files • Ontologies have different text representation • Heuristics algorithm that produces a structural diff between two versions • Compares the structure of the two ontology versions • Identifies frames changed and what changes were made

PromptDiff Algorithm • An extensible set of heuristic matchers • Fixed-point algorithm to combine the results of the matchers to produce a structural diff between two versions

PROMPTFACTOR • Tool for factoring out semantically independent part of an large ontology into a new sub-ontology • Ensures that severed links do not introduce ill-defined concepts in the sub-ontology • User can specify concepts of interest • Performs the transitive closure of the superclass relation and all the relations defined by slots • Target ontology works as stand-alone

PromptFactor Algorithm • User specifies the concept of interest • PromptFactor traverses the ontology term • Determines transitive closure of all relations including subclass-of relation • Determines all the parents of selected term in hierarchy • User interactive • Determines inconsistencies

Prompt Demo • It is available as a plug-in for Protégé 3.4 • Uses linguistic similarity matches between concepts • Also matches slot names and slot value types • In cases where automation is not possible, user intervention is needed; possible actions are suggested • Alignment is followed by merging • Alignment is establishing links between the ontologies • Merging is the creation of a single coherent ontology

Prompt Demo

The Web of Data • Data sources span a large range of domains • RDF data model is used to publish structured data on the web • Explicit RDF links exist between entities in different data sources • However, there is a lack of tools to set RDF links to other data sources

Silk • It is a link specification language • Allows specification of the links that should be discovered between data sources, as well as conditions to be fulfilled to be linked • Link conditions are specified using similarity metrics; they can use aggregation functions to combine similarity scores • Data access performed using SPARQL

Silk Features • Support for owl:sameAs links and other types of RDF links • Provides a declarative language to specify link conditions • Datasets need not be replicated locally • Caching, indexing and entity pre-selection are used to enhance performance

Silk LSL example

Silk LSL example..contd

Silk similarity metrics • Similarity metrics can be combined using aggregation functions • Sets of resources can be selected using Silk RDF path selector language

Silk Pre-Matching • Comparison of all entities in Source ‘S’ and Target ‘T’ would need O(|S|*|T|) • Using pre-matching a limited set of target entities that are likely to match a given source entity is found • Performed by indexing the target resources based on their property values • Using this scheme reduces runtime to O(|S| + |T|)

Silk Implementation

Managing coreferences • Semantic web vision - Large quantities of information • Readily available • Interlinked • Machine readable • Fragmented web • Significant overlap • Need to identify ‘duplicates’ • Co-reference resolution – determining “equivalent” URIs

Co-reference Resolution Service (CRS) • Systematic analysis and heuristic based approach : • Identifying • Publishing • Managing • Using co-reference information • Most prevalent way – owl:sameAs • Equivalence – context dependent

CRSes • Maintain sets of equivalent URIs • Storing co-reference data separately • URI definition and synonyms are kept separate • Management techniques - history, rollback, annotation • Use of multiple CRSes that applications can use • Core functionality in PHP – easy integration • Backed by MySQL

Data representation in CRS • Equivalent URIs are stored in bundles • 1 URI in each bundle is considered as a canon- preferred URI • Formation of bundles: • Check if URI already exists in any bundle • If not, create a ‘singleton’ bundle for new URIs • Perform merge – union of bundles with “equivalent” URIs • Constituent bundles that were merged are marked inactive

Examples of bundle formation

Data representation • Data storage – Indexed tables of hashed URIs • Permits fast lookup to find: • Canon of given URI • All URIs in a bundle • Deprecate URIs by flags • Finding all equivalences - coref:coreferenceData links to the bundle for that URI and recursively repeat the process for each URI in that bundle

<rdf:RDF xmlns:coref="http://www.rkbexplorer.com/ontologies/coref#" • xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> • <coref:Bundle> • <coref:canon rdf:resource="http://southampton.rkbexplorer.com/id/person-00021"/> • <coref:duplicate rdf:resource="http://acm.rkbexplorer.com/id/person-102898" /> • <coref:duplicate rdf:resource="http://citeseer.rkbexplorer.com/id/resource-CSP109002" /> • <coref:duplicate rdf:resource="http://dblp.rkbexplorer.com/id/people-27aedbcb" /> • <coref:duplicate rdf:resource="http://eprints.rkbexplorer.com/id/kfupm/person-27aed0c1" /> • <coref:duplicate rdf:resource="http://southampton.rkbexplorer.com/id/person-00021" /> • <coref:duplicate rdf:resource="http://wiki.rkbexplorer.com/id/hugh_glaser" /> • <coref:lastUpdated>2009-01-16 11:11:40</coref:lastUpdated> • </coref:Bundle> • </rdf:RDF> RDF description of equivalent URIs in a bundle

Ways to speed up • Look up only 1 URI from each CRS • Follow only coref:canon predicate • Lookup would need O(log|S|+ log|T|)

References [1] The PROMPT Suite: Interactive Tools For Ontology Merging And Mapping – Natalya F. Noy and Mark A. Musen;Stanford Medical Informatics, Stanford University [2] Managing Co-reference on the Semantic Web - Hugh Glaser, Afraz Jaffri, Ian C. Millard School of Electronics and Computer Science University of Southampton Southampton, Hampshire, UK [3] Ontology Mapping: The State of the Art Yannis Kalfoglou and Marco Schorlemmer [4] Kalfoglou, Y. and Schorlemmer, M. (2003a). IFMap: an ontology mapping method based on information flow theory. Journal on Data Semantics, 1(1):98–127. [5] Silk – A Link Discovery Framework for the Web of Data Julius Volz, Christian Bizer et al.

Ontology Mapping and Link Discovery Framework