400 likes | 617 Views
Ontology Merging. Kyriakos Kritikos ( ΥΔ) Miltos Stratakis (MET). Representation Matching. Problem of creating semantic mappings between two data representations Mapping examples Element location of one representation maps to element address of the other Contact-phone maps to agent-phone
E N D
Ontology Merging Kyriakos Kritikos (ΥΔ) Miltos Stratakis (MET) HY-566 Semantic Web
Representation Matching • Problem of creating semantic mappings between two data representations • Mapping examples • Element location of one representation maps to element address of the other • Contact-phone maps to agent-phone • Listed-price maps to price * (1 + tax-rate) • Fundamental step in numerous data management applications • But, manual effort in semantic mapping has become intensive, due to the expansive development of the above applications HY-566 Semantic Web
Applications of Representation Matching (I) • Schema integration (early 1980s) • Need to merge a set of given schemas into a single global schema • Data warehousing - Data mining (early 1990s) • Need to translate data between multiple databases • Data coming from multiple sources must be transformed to data conforming to a single target schema • Knowledge Base construction (late 1980s, all 1990s) • Used in AI • KBs store complex types of entities and relationships, using “extended database schemas” (ontologies) • Requirement of semantic mapping between the involved ontologies (ontology matching problem) HY-566 Semantic Web
Applications of Representation Matching (II) • Data integration systems (recent years) • Provide an uniform query interface to a big number of data sources, by enabling users to pose queries against a mediated schema • Need to use a set of semantic mappings between the mediated schema and the local schemas of the data sources • Peer data management systems (recent years) • Allow peersto query and retrieve data directly from each other • Need of creation of semantic mappings among the peers HY-566 Semantic Web
Using Ontologies as Representations • Ontology: “Explicit specification of a conceptualization” • Can be used • In an integration task to • Describe the semantics of the information sources • Make the content explicit • For the identification and association of semantically corresponding information concepts HY-566 Semantic Web
Content Explication • The way the ontologies are employed for content explication can be different • We can identify three different directions • Single ontology approaches • Multiple ontology approaches • Hybrid ontology approaches HY-566 Semantic Web
Single Ontology Approaches • Use one global ontology providing a shared vocabulary for the specification of the semantics • Can be applied to integration problems where all information sources to be integrated provide nearly the same view on a domain • Not effective if one information source has a different view on a domain HY-566 Semantic Web
Multiple Ontology Approaches • Each information source is described by its own ontology • Each source ontology can be developed without respect to other sources or their ontologies • Can simplify the integration task • Supports the change of sources • Not effective in comparing different source ontologies, due to the lack of a common vocabulary HY-566 Semantic Web
Hybrid Ontology Approaches • Semantics of each source is described by its own ontology, but these ontologies are built from a global shared vocabularyto make them comparable • The shared vocabulary contains basic terms of a domain which are combined in the local ontologies in order to describe more complex semantics • New sources can easily be added without the need of modification • But, existing ontologies can not easily be reused HY-566 Semantic Web
The need for Ontology Matching (Integration) Semantic Web evolution: • Requirement for formal descriptions of parts of our humanenvironment (i.e. descriptions of parts of the real world) • These descriptions, invarious degrees of formalness and specificity, are the ontologies • To form a real web of semantics, ontologies from different sources should be linked and related to each other • Problem: The reuse of existing ontologies is often not possible without considerable effort • Ontologies need to • Be integrated (i.e. merged into a new ontology) • Be aligned (i.e. they have to be brought into mutual agreement) HY-566 Semantic Web
Ontology Integration Process Consists of three steps: • Find the places in the ontologies where they overlap • Relate concepts that are semantically close via equivalence and subsumption relations (aligning) • Check the consistency, coherency and non-redundancy of the result HY-566 Semantic Web
Technical Problems with Ontology Combination • The technical problems that underlie the difficulties in ontology merging and aligning are: • The mismatches that may exist between separate ontologies (Mismatches between Ontologies) • The synchronization of the changes made to an ontology with the revisions to the applications and data sources that use them(Ontology Versioning) HY-566 Semantic Web
Mismatches between Ontologies • Key type of problems that hinder the combined use of independently developed ontologies • We distinguish two levels at which these mismatches may appear: • Language or meta-model level • Level of the language primitives that are used to specify an ontology • Mismatches at this level are between the mechanismsto define classes, relations etc. • Ontology or model level • Level of the actual ontology of a domain • A mismatch at this level is a difference in the way the domain is modelled HY-566 Semantic Web
Language level Mismatches • Occur in combinations of ontologies written in different ontology languages • We distinguish four types of this level mismatches • Syntax • Different ontology languages often use different syntaxes • Constitutes probably the simplest kind of language level mismatch • Logical representation • Existence of different representations of logical notions • Focused in which language constructsshould be used to express something, not in whether something can be expressed • Semantics of primitives • Sometimes, although the samename is used for a language construct in two languages,the semantics may differ (e.g. when there are several interpretationsof A equalTo B) • Language expressivity • Implies thatsome languages are able to express things that are notexpressible in other languages (e.g. some languageshave constructs to express negation and others havenot) HY-566 Semantic Web
Ontology level Mismatches • Happen in combination of two or more ontologies that describe (partly) overlapping domains • We can distinguish the mismatches of this level in four classifications: • Conceptualization mismatch • A difference in the way a domain is interpreted, which results in different ontological concepts or different relations between those concepts • Explication mismatch • A difference in the way the conceptualization is specified • Terminological mismatch • A difference in the way the terms are described • Encoding mismatch • Values in the ontologies may be encoded in different formats (e.g. a date may be represented as “dd/mm/yyyy” or as “mm-dd-yy”) • Terminological and encoding mismatches can be considered as specialized explication mismatches HY-566 Semantic Web
Conceptualization Mismatches • We distinguish two types of these mismatches: • Scope • When two classes seem to represent the same concept, but do not have exactly the same instances (e.g. several administrations use slightly different concepts of employee) • Model coverage and granularity • The mismatches of this level are in the part of the domain that is covered by the ontology or in the level of detail to which that domain is modelled • For example, one ontology might model cars but not trucks, another might represent trucks but only classify them into a few categories, while a third one might make very specified distinctions between types of trucks based on their general physical structure, weight etc. HY-566 Semantic Web
Explication Mismatches • We distinguish two types of these mismatches focused on the style of modeling: • Paradigm • Different paradigms can be used to represent concepts such as time, action, plansetc. • For example, the use of different “top-level” ontology is a mismatch of this type • Concept description • Several choices can be made for the modeling of concepts in the ontology • For example, we can consider the place where the distinction between scientific and non-scientific publications is made • A dissertation can be modelled as dissertation < book < scientific publication < publication, or as dissertation < scientific book < book < publication HY-566 Semantic Web
Terminological Mismatches • We distinguish two term types in which there can be these mismatches: • Synonym terms • Concepts could be represented by different names • For example, an ontology may use the term “car” and another ontology may use the term “automobile” • Homonym terms • The meaning of a term could be different in an other context • For example, the term “conductor” has a different meaning in a music domain than in an electric engineering domain HY-566 Semantic Web
Ontology Versioning • In an open domain, the changes in the ontologies used are unavoidable, so it becomes very important to keep track of these changes • Although the problem is introduced by subsequent changes to one specific ontology, the most important problems are caused by the dependencies on that ontology • A versioning scheme should pay attention of the following aspects • The relation between succeeding revisions of one ontology • The relation between the ontology and its dependencies: • Instance data that conforms to the ontology • Other ontologies that are built from or import the ontology • Applications that use the ontology HY-566 Semantic Web
Versioning Scheme Requirements • Identification • For every use of a concept or a relation, a versioning framework should provide an distinct reference to the intended definition • Change tracking • A versioning framework should make the relation of one version of a concept or relation to other versions of that construct explicit • Transparent translating • A versioning framework should as far as possible automatically perform conversions from one version to another, to enable transparent access HY-566 Semantic Web
Practical Problems with Ontology Combination • Finding alignments • It is difficult to find the terms that need to be aligned • Diagnosis • The consequences of a specific mapping (unforeseen implications) are difficult to see • Repeatability of merges • The sources that are used for the merging continue to evolve • The alignments that are created for the merging should be as much reusable as possible for the merging of the revised ontologies • Very important in the context of ontology maintenance HY-566 Semantic Web
Problems Overview HY-566 Semantic Web
Super-imposed Metamodel • Transforms information between representations. • Approach: • Represent info from diff models in a uniform way • Provide a mapping formalism. • Technique: • Ontology langs are represented in a meta-model through RDF triples. • Mapping specified by production rules over RDF triples. • +: • Mapping rules provide integration at schema and instance level. • -: • Handles only language mismatches but not expressivity. • Mappings are specified manually. HY-566 Semantic Web
OKBC • A generic interface to KRS. • A KR lang is mapped to OKBC Knowledge Model (KM). • +: • Interoperability achieved at the level of OKBC KM. • Solves language mismatches but not expressivity. • -: • Notions requiring higher level of expressivity are lost. • Does not express terminological axioms like covering, disjointness, partition , exclusion. HY-566 Semantic Web
OntoMorph (I) • Transformation system for symbolic knowledge. • Facilitates: • Ontology merging. • Rapid generation of KB translators. • Provides 2 mechanisms: • Syntactic rewriting via pattern-directed rewrite rules. • Semantic rewriting that modulates: • syntactic rewriting via semantic models. • logical inference via an integrated KR system. • OntoMorph architecture facilitates incremental development and scripted replay of transforms. HY-566 Semantic Web
OntoMorph (II) • Focuses on aligning ontologies through 3 steps: • Design transforms to bring sources to mutual agreement. • Editing sources to carry out the transforms. • Taking the union of the morphed sources. • Steps: • 2 is facilitated by transforming ontos in common format. • 1 is less automatable and involves human negotiation. • +: • Language mismatches but not expressivity. • Ontology level mismatches but not coverage of model • Repeatability • -: • Transforms are expressed manually. • Merging is not dealt at all. HY-566 Semantic Web
Scalable Knowledge Composition • Developed algebra for onto composition that: • Operates on directed label graphs like ontos. • Each operator has input a graph of semi-structured data and transforms it to a graph.(composable) • Operations are knowledge driven by using articulation rules that are : • Logical rules (semantic implication between terms) • Functional rules (conversion between terms across ontos) • Intersection op produces articulation onto that contains terms that are related and their relations. • +: • Solves conceptual and terminological mismatches. • Rules are expressed by engineer and lexical knowledge. • Repeatability. • -: • Most rules specified manually. • No support for merging. HY-566 Semantic Web
Chimaera (I) • Chimaera is onto merging and diagnosis tool. • Supports ontology browsing and editing. • It is targeted at lightweight ontologies. • Supports 2 merging tasks: • Joins two similar terms under the same name. • Identifies terms that should be related by subsumption, disjointness or instance relations and provides support for the introduction of these relations. • Chimaera also generates by heuristics: • Name resolution lists for related terms. • Taxonomy resolution lists where it suggests taxonomy areas for reorganization. HY-566 Semantic Web
Chimaera (II) • Has diagnostic support for : • Verifying • Validating • Critiquing ontologies. • +: • Solves mismatches at terminological and scope of concept level. • Helps alignment by providing possible edit points. • Diagnosis of the merging process • -: • Not automatic – everything requires user interaction. • No repeatability. • Use of local context for edit points. HY-566 Semantic Web
Prompt • Prompt is interactive ontology-merging tool. • Guides the user by: • Making suggestions based on linguistic-similarity matches and syntactic clues. • By detecting conflicts of one realization of a suggestion. • By proposing conflict resolution strategies. • For every op it populates 3 sets: • Changes performed automatically. • New suggestions for the user. • Conflicts introduced like: name conflicts, dangling references, redundancy in class-hierarchy and inconsistencies. • Prompt points to places requiring change and for every place it proposes new actions. • Adv – disadv same as Chimaera but supports repeatability. HY-566 Semantic Web
FCA-Merge (I) • FCA-Merge: • A bottom-up approach for ontology-merging • Offers a global structural desc of the merge process • Its mechanism based on instances of 2 ontos. • The merge process contains 3 steps: • Instance extraction by natural language techniques and computation of 2 formal contexts based on extracted instances. • Derivation of a common context and computation of pruned concept lattice by math techniques of FCA. • Generation of merged-ontology based on concept lattice with the help of engineer and OntoEdit HY-566 Semantic Web
FCA-Merge (II) • Restrictions: • Input documents should be domain-dependent. • Each doc should cover all concepts from source ontos. • Each doc must separate the concepts well enough –> if concepts not separated rightly by the method, the engineer should provide more and better docs. • +’s and –’s: • Terminological and scope of concepts mismatches. • Finding alignments with the help of the lattice. • Diagnosis of results by using OntoEdit. • Repeatability by storing the pruned concept lattice. HY-566 Semantic Web
GLUE (I) • Applies machine learning techniques for alignment. • 3 main points: • Computation of joint probability distribution of every concepts involved. In this way: • Any similarity measure can be computed with JBD. • Approach applicable to broad range of ontology-matching problems. • Multi-strategy learning for computing JBD. In this way: • Many types of info can be used to maximize the matching accuracy. • System extensible to new learners. • Exploits domain restrictions and general heuristics for maximizing matching accuracy by using relaxation labeling. • Process compose of 3 main steps performed by the automatable components: Distribution Estimator, Similarity Estimator and Relaxation Labeler. HY-566 Semantic Web
GLUE (II) • Restrictions: • Only 1-1 mapping of concepts. • Nodes not matched cause insufficient training data. • Implementation of base learners resulted in single general-purpose text classification. • Nodes not matched cause they are ambiguous. User interaction is needed in this way. • Some pair of nodes should not be examined at all. • +’s and –’s: • Local scope of concepts and proper classification. • Finding alignments and repeatability automatic. • Different encoding is solved by adding appropriate learner. HY-566 Semantic Web
Anchor-Prompt (I) • Has input a pair of similar pairs provided by user or by heuristics. • Its algorithm analyzes the paths in the onto sub-graph and determined which classes frequently appear in similar positions. • Extends the approaches used in Prompt. • It is implemented upon OKBC protocol. • It finds only 1-1 mappings between concepts HY-566 Semantic Web
Anchor-Prompt (II) • Limitations: • Very long paths don’t produce accurate results. • Path-length=0 (Chimaera), Path-length=1 (Prompt). • Incidental matches can be produced (simil limit). • When comparing a deep ontology with many slots and a shallow ontology that has slot relating top classes, then results are same with Prompt. • +’s: • Concept scope mismatches are dealt with. • Finding alignments and repeatability are automatic tasks. HY-566 Semantic Web
SHOE • An HTML-based ontology language. • Provides a rule mechanism for alignment: • Common items are mapped by inference rules. • Terminological diffs are mapped by if-and-only-if rules. • Scope diffs require mapping of categories where the one subsumes the other. • Encoding diffs handled by mapping individual values. • Provides version numbers to ontologies and facilitates both identification of the revisions and explicit specification of its relation to other revisions (change-tracking). HY-566 Semantic Web
Conclusions (I) • Discovered 4 different approaches that handle interoperability at the language level: • Aligning the meta-model. • Layered interoperability. • Transformation rules. • Mapping onto a common knowledge model. • We found tools that suggest alignments and mappings with the use of heuristics. There are two types of heuristics: • Linguistic based-matches (FCA-Merge). • Structural and model similarity (Chimaera and Prompt). HY-566 Semantic Web
Conclusions (II) • We found tools that semi-automate or fully-automate the merging process but having only 1-1 mappings of concept using different techniques: • Computation of pruned concept lattice (FCA-Merge). Linguistic and FCA techniques. • Machine learning techniques (GLUE). • Using global instead of local context (Anchor-Prompt). • Interoperability at the model can be achieved by a common top level ontology. Conform to a common standard. HY-566 Semantic Web
Conclusions (III) • Different approaches for diagnosing or checking the results of assignments: • Domain independent verification and validation checks: name conflicts, dangling references etc. • Validation that requires reasoning: redundancy at the class hierarchy, value restrictions violated etc. • Several tools support an executable specification of mappings and transforms (SKC,OntoMorph,Prompt,FCA-Merge,GLUE,Anchor-Prompt). • Most techniques and tools don’t deal versioning. HY-566 Semantic Web