1 / 17

The MOMIS-STASIS approach for Ontology-Based Data Integration

The MOMIS-STASIS approach for Ontology-Based Data Integration. D. Beneventano, M. Orsini , L. Po, S. Sorrentino DII, University of Modena and Reggio Emilia. Outline. Introduction Ontology Based Data Integration MOMIS & STASIS The goal of the MOMIS-STASIS approach

hetal
Download Presentation

The MOMIS-STASIS approach for Ontology-Based Data Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The MOMIS-STASIS approach for Ontology-Based Data Integration D. Beneventano, M. Orsini, L. Po, S. Sorrentino DII, Universityof Modena and Reggio Emilia

  2. Outline • Introduction • Ontology Based Data Integration • MOMIS & STASIS • The goal of the MOMIS-STASIS approach • Semantic Link Generation • Global Schema Generation • The MOMIS-STASIS architecture • An Application Example • Future Work • Conclusion

  3. Ontology Based Data Integration • Data integration:to combine data residing at distributed heterogeneous sources • Integration System: wrapper/mediator architecture based on a Global Virtual Schema (Global Virtual View - GVV) and a set of data sources • The data sources contain the real data • The GVV provides a reconciled, integrated, and virtual view of the underlying sources • Mapping among sources and the GVV • Ontologies can be used in the integration task to describe the semantics of the information sources • Ontology-Based Data Integration: use of ontologies to effectively combine data coming from multiple heterogeneous sources

  4. MOMIS & STASIS • MOMIS (MediatorEnvirOnment for Multiple Information Sources) is a Data Integration System which performs information extraction and integration from both structured and semi-structured data sources • single ontology approach: the lexical ontology WordNet (WN) is used as a shared vocabulary for: • the specification of the semantics of data sources • the identification and association of semantically corresponding information concepts • STASIS is a comprehensive application suite which allows enterprises to simplify the mapping process between data schemas based on semantics • Ontology-driven Semantic Mapping: identification of mappings between concepts of different schemas based on the schemas annotation with respect to a set of ontologies (multiple ontology approach)

  5. The goal of the MOMIS-STASIS approach • To combine the MOMIS and STASIS frameworks to obtain an effective approach for Ontology-Based Data Integration • extension of the MOMIS system by using the Ontology-driven Semantic Mapping framework of STASIS: • enabling the MOMIS system to employ generic OWL ontologies, with respect to the limitation of using only the WordNet lexicalontology • developing a new method to compute semantic mapping among source schemas in the MOMIS system. • Macro-steps of the MOMIS-STASIS approach: • Semantic Link Generation (STASIS) • Global Schema Generation (MOMIS)

  6. Semantic Link Generation • Easy to use GUI allowing users • to identify semantic elements in an easy way • to create mappings by considering the meaning of elements rather than their syntactical structure • Distributed registry and repository network: • intelligent mapping suggestions by reusing mapping information from earlier semantic links • Ontology-driven Semantic Mapping definition • mappings between entities of different schemas based on annotations linking the entities with concepts of an ontology

  7. Semantic Link Generation • The Semantic Link Generation processiscomposedby3 main steps: • 1- obtaining a neutral schema representation • 2- local source annotation • 3- semantic mapping discovery

  8. Semantic Link Generation • Step 1. Obtaining a neutral schema representation Local schemas are described by a unified data model called Logical Data Model (LDM). It allows the representation of the following semantic entities: classes (or concepts), relationships (or object properties), and attributes (or data-type properties); classes are organized in a is-a hierarchy • Step 2. Local Source Annotation • Semantics of the data expressed by semantic correspondences betweentheschema and ontologies. Semantic entities n to be annotated with respect to one or more ontologies. eed • An annotation element is a tuple< SE, R, C> • SE is a semantic entity of the schema • C is a concept of the ontology • R is the semantic relationship between SE and C: • equivalence (AR EQUIV ) • more general (AR SUP) • less general (AR SUB) • disjointness (AR DISJ)

  9. Semantic Link Generation • Step 3.Semantic Mapping Discovery Based on the annotation made with respect to the ontologies and on the logic relationships identified between these aligned ontologies, reasoning can identify correspondences among the semantic entities. Semantic mappings between entities of two source schemas (called semantic link- SL) : • equivalence (EQUIV) • more general (SUP) • less general (SUB) • disjointness (DISJ);

  10. Global Schema Generation • Global-As-View (GAV) approach where each global class of the Global Schema is characterized in terms of a view over the local sources • INPUT : Common Thesaurus SLs generated by the STASIS framework • METHOD : clustering techniques • Given a set of data sources MOMIS synthesizes in a semi-automatic way a Global Schema (Global Virtual View - GVV): • a global class G=(L,GA) is generated for each cluster C where L are the local classes of the cluster C and GA are the global attributes of G • Union of the local attributes • Fusion of “similar attributes” (by using the Common Thesaurus) • a Mapping Table (MT) is generated for each global class, which contains the mappings to connect the global attributes with the local sources attributes. MT is a table GAxL : an element MT[GA][L] represents the attributesof the local class L mapped into the global attribute GA.

  11. The MOMIS-STASIS Architecture

  12. An Application Example • As a simple example let us consider two relational local sources L1 and L2 , where each schema contains a relation describing purchase orders: • We will describe step by step the application of the MOMIS-STASIS approach: • Step 1. Obtaining a neutral schema representation • Local sources L1 and L2 are translated in the neutral representation and are represented in LDM data model. L1.PURCHASE_ORDER, L1.BILLING ADDRESS, L1.DELIVERY ADDRESS are represented as semantic entities. • Step 2. Local Source Annotation • We consider the annotation of schemas and the derivation of mappings w.r.t. a single common ontology: the Purchase Order Ontology.

  13. An Application Example Some examples of simple annotations discovered by applying the automatic “name-based” technique. : An example of complex annotation is which can be considered as a designer refinement of the above simple annotations to state that the address in the PURCHASE_ORDER table is the “address of the Shipping in a Purchase Order”.

  14. An Application Example • Step 3. Semantic Mapping Discovery From the previous annotations, e.g. the following semantic link is derived: While no semantic link among CUSTOMER_LOCATION and BILLING _ADDRESS is generated. • Step 4. Global Schema Generation • Given the set of semantic links described above and collected in the Common Thesaurus, the GVV is automatically generated and the classes describing the same or semantically related concepts in different sources are identified and clusterized in the same global class. Moreover, the Mapping Table is automatically generated.

  15. Future Work • An advantage of the proposed approach: an accurate schema annotation. Problem: this annotation is performed manually by the integration designer. • We propose a preliminary idea to overcome this problem which can be summed up in three steps: • 1- Annotation w.r.t. WordNet (WN): both the ontologies and the local sources are annotated, w.r.t. WN, by using Automatic Lexical Annotation techniques based on Word Sense Disambiguation • 2- WN semantic relationship discovery:startingfrom the previous annotations, a set of WN semantic relationships (synonym (equivalence), hypernym(more general) etc.) is discovered among semantic entities and ontology concepts • 3- Local source annotation for Ontology Driven Semantic Mapping: starting from the set of WNsemanticrelationships, a correspondent set of annotations for Ontology-Driven Semantic Mapping can be discovered

  16. Future Work

  17. Conclusions • We described the early effort to obtain an effective Global Schema Generation approach for Ontology-Based Data Integration, by combining the techniques provided by MOMIS and the STASIS frameworks • Extension of the MOMIS system to perform Ontology-driven Semantic Mapping discovery: the annotation of data sources elements w.r.t. generic ontologies (expressed in OWL) • Extension of the MOMIS system to overcome the limitation of using only the lexical ontology WN by introducing a multiple ontology approach w.r.t. the previous single ontology approach • Even if this work needs to be further investigated, it represents a fundamental start point versus a fully automatic Ontology-Based Data Integration System

More Related