1 / 47

Ontology Mapping and link discovery

Ontology Mapping and link discovery. Kunal Narsinghani Ashwini Lahane. Agenda. Introduction Levels of heterogeneity Previous work in the field PROMPT Suite of Tools Prompt on Protégé The Web of Data CRS : Managing Co-references Silk – A link discovery framework. Introduction.

ndurkin
Download Presentation

Ontology Mapping and link discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Mapping and link discovery Kunal Narsinghani Ashwini Lahane

  2. Agenda • Introduction • Levels of heterogeneity • Previous work in the field • PROMPT Suite of Tools • Prompt on Protégé • The Web of Data • CRS : Managing Co-references • Silk – A link discovery framework

  3. Introduction • Can a single ontology suffice for various applications? • Definition – The task of relating the vocabulary of two Ontologies that share the same domain of discourse • It’s a morphism that consists of a collection of functions assigning symbols used in one vocabulary to the symbols in the other[1] • This would provide a common layer from which ontologies can be accessed and exchange information. • Translation is different from mapping

  4. Introduction • An analogy to the problem – Clocks Levels of Heterogeneity in Ontologies • Syntactic • Structural • Semantic

  5. Mapping discovery • First approach is to use a reference ontology • Example – the upper Ontologies SUMO and DOLCE • What when a shared ontology is not available? • Structural & definitional information can be used to discover mappings • Example tools – IF-Map, QOM, MAFRA & Prompt

  6. IF-MAP architecture Fig: The steps in IF-MAP

  7. PROMPT Suite of Tools • Interactive tools for ontology merging and mapping • Ontology • formal specification of domain information • facilitate knowledge sharing and reuse • Different ontologies –may overlap, need to be reconciled • Determine correlation • Find all concepts • Determine similarities • Change source ontologies or remove overlap • Record mapping for future reference

  8. Ontology Management • Tasks • Finding correlations • Merging ontologies • Version management • Factoring ontologies • Tools • Benefit from being tightly integrated into single framework • Uniform user interface • Same interaction paradigms • Easy access from one tool to another

  9. PROMPT Knowledge Model • Based on knowledge model of Protégé • Frame based • Types of frames • Class • Set of entities specifying a concept • Slots • Attributes of class • Has domain and range • Must have unique names • Instances • Elements of class

  10. PROMPT Framework • Tools for multiple-ontology management • Extension to Protege ontology-editing environment • Open architecture allows easy extension with plugins • Tools in PROMPT • IPROMPT – Interactive ontology merging tool • ANCHORPROMPT – a graph-based tool for finding similarities between ontologies • PROMPTDIFF –for finding a diff between two versions of the same ontology • PROMPTFACTOR – a tool for extracting a part of an ontology

  11. PROMPT Framework

  12. IPROMPT • Interactive ontology merging tool • Leads user through merging process • Suggestions for merging • Identifies inconsistencies and potential problems • Suggests strategies for resolving • Uses structure of concepts and their relation along with user input • Decision based on local context • Iterative

  13. IPROMPT Algorithm

  14. IPROMPT Algorithm • Creates initial suggestion based on lexical similarity of names • Merged ontology contains frames which are similar to frames in input ontologies • 2 ontologies O1 and O2 are merged to form Om • Merging decisions are designer and task dependent • Set of knowledge based operations defined • For each operation: • Changes performed automatically • New merging suggestions • Inconsistencies and potential problems

  15. Class hierarchies

  16. Suggestion for merging

  17. IPROMPT Operations • Merge classes • Merge slots • Merge instances • Shallow copy of a class • Copy class from source ontology to merged • Deep copy of a class • Also copies all the parents of the class up to the root hierarchy

  18. Inconsistencies & Potential Problems • Name conflicts • Dangling references • Redundancy in the class hierarchy • Slot values violating slot-value restrictions

  19. Additional features • Setting up preferred ontology • Maintaining user focus • Providing feedback to user • Logging of ontology merging and editing operations

  20. ANCHORPROMPT • Graph based tool for finding similarities • Compares larger portions • Goal : Augment IPROMPT by determining additional points of similarity • Input : Anchors - Set of pairs of related terms • Anchor identification – Manual /Automatic • Each ontology is viewed as a directed labeled graph

  21. ANCHORPROMPT representation

  22. ANCHORPROMPT algorithm

  23. Algorithm • Begins with anchor pair • TRIAL, Trail • PERSON, Person • Path 1: TRIAL -> PROTOCOL -> STUDY-SITE -> PERSON • Path 2: Trial -> Design -> Blinding -> Person • Determine similarity score for pair of related terms • If two pairs of terms from the source ontologies are similar and there are paths connecting the terms, then the elements in those paths are often similar as well

  24. PROMPTDIFF • Tool for comparing ontology versions • Version comparison in software code is based on comparing text files • Ontologies have different text representation • Heuristics algorithm that produces a structural diff between two versions • Compares the structure of the two ontology versions • Identifies frames changed and what changes were made

  25. PromptDiff Algorithm • An extensible set of heuristic matchers • Fixed-point algorithm to combine the results of the matchers to produce a structural diff between two versions

  26. PROMPTFACTOR • Tool for factoring out semantically independent part of an large ontology into a new sub-ontology • Ensures that severed links do not introduce ill-defined concepts in the sub-ontology • User can specify concepts of interest • Performs the transitive closure of the superclass relation and all the relations defined by slots • Target ontology works as stand-alone

  27. PromptFactor Algorithm • User specifies the concept of interest • PromptFactor traverses the ontology term • Determines transitive closure of all relations including subclass-of relation • Determines all the parents of selected term in hierarchy • User interactive • Determines inconsistencies

  28. Prompt Demo • It is available as a plug-in for Protégé 3.4 • Uses linguistic similarity matches between concepts • Also matches slot names and slot value types • In cases where automation is not possible, user intervention is needed; possible actions are suggested • Alignment is followed by merging • Alignment is establishing links between the ontologies • Merging is the creation of a single coherent ontology

  29. Prompt Demo

  30. The Web of Data • Data sources span a large range of domains • RDF data model is used to publish structured data on the web • Explicit RDF links exist between entities in different data sources • However, there is a lack of tools to set RDF links to other data sources

  31. Silk • It is a link specification language • Allows specification of the links that should be discovered between data sources, as well as conditions to be fulfilled to be linked • Link conditions are specified using similarity metrics; they can use aggregation functions to combine similarity scores • Data access performed using SPARQL

  32. Silk Features • Support for owl:sameAs links and other types of RDF links • Provides a declarative language to specify link conditions • Datasets need not be replicated locally • Caching, indexing and entity pre-selection are used to enhance performance

  33. Silk LSL example

  34. Silk LSL example..contd

  35. Silk similarity metrics • Similarity metrics can be combined using aggregation functions • Sets of resources can be selected using Silk RDF path selector language

  36. Silk Pre-Matching • Comparison of all entities in Source ‘S’ and Target ‘T’ would need O(|S|*|T|) • Using pre-matching a limited set of target entities that are likely to match a given source entity is found • Performed by indexing the target resources based on their property values • Using this scheme reduces runtime to O(|S| + |T|)

  37. Silk Implementation

  38. Managing coreferences • Semantic web vision - Large quantities of information • Readily available • Interlinked • Machine readable • Fragmented web • Significant overlap • Need to identify ‘duplicates’ • Co-reference resolution – determining “equivalent” URIs

  39. Co-reference Resolution Service (CRS) • Systematic analysis and heuristic based approach : • Identifying • Publishing • Managing • Using co-reference information • Most prevalent way – owl:sameAs • Equivalence – context dependent

  40. CRSes • Maintain sets of equivalent URIs • Storing co-reference data separately • URI definition and synonyms are kept separate • Management techniques - history, rollback, annotation • Use of multiple CRSes that applications can use • Core functionality in PHP – easy integration • Backed by MySQL

  41. Data representation in CRS • Equivalent URIs are stored in bundles • 1 URI in each bundle is considered as a canon- preferred URI • Formation of bundles: • Check if URI already exists in any bundle • If not, create a ‘singleton’ bundle for new URIs • Perform merge – union of bundles with “equivalent” URIs • Constituent bundles that were merged are marked inactive

  42. Examples of bundle formation

  43. Data representation • Data storage – Indexed tables of hashed URIs • Permits fast lookup to find: • Canon of given URI • All URIs in a bundle • Deprecate URIs by flags • Finding all equivalences - coref:coreferenceData links to the bundle for that URI and recursively repeat the process for each URI in that bundle

  44. <rdf:RDF xmlns:coref="http://www.rkbexplorer.com/ontologies/coref#" • xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> • <coref:Bundle> • <coref:canon rdf:resource="http://southampton.rkbexplorer.com/id/person-00021"/> • <coref:duplicate rdf:resource="http://acm.rkbexplorer.com/id/person-102898" /> • <coref:duplicate rdf:resource="http://citeseer.rkbexplorer.com/id/resource-CSP109002" /> • <coref:duplicate rdf:resource="http://dblp.rkbexplorer.com/id/people-27aedbcb" /> • <coref:duplicate rdf:resource="http://eprints.rkbexplorer.com/id/kfupm/person-27aed0c1" /> • <coref:duplicate rdf:resource="http://southampton.rkbexplorer.com/id/person-00021" /> • <coref:duplicate rdf:resource="http://wiki.rkbexplorer.com/id/hugh_glaser" /> • <coref:lastUpdated>2009-01-16 11:11:40</coref:lastUpdated> • </coref:Bundle> • </rdf:RDF> RDF description of equivalent URIs in a bundle

  45. Ways to speed up • Look up only 1 URI from each CRS • Follow only coref:canon predicate • Lookup would need O(log|S|+ log|T|)

  46. References [1] The PROMPT Suite: Interactive Tools For Ontology Merging And Mapping – Natalya F. Noy and Mark A. Musen;Stanford Medical Informatics, Stanford University [2] Managing Co-reference on the Semantic Web - Hugh Glaser, Afraz Jaffri, Ian C. Millard School of Electronics and Computer Science University of Southampton Southampton, Hampshire, UK [3] Ontology Mapping: The State of the Art Yannis Kalfoglou and Marco Schorlemmer [4] Kalfoglou, Y. and Schorlemmer, M. (2003a). IFMap: an ontology mapping method based on information flow theory. Journal on Data Semantics, 1(1):98–127. [5] Silk – A Link Discovery Framework for the Web of Data Julius Volz, Christian Bizer et al.

More Related