410 likes | 426 Views
Learn how to integrate schemas using ontological matching, schema-to-ontology mapping, and global view construction. Explore the benefits and challenges of this approach, as well as the automation possibilities. Discover the features of COMA schema matching system and the steps involved in ontological matching.
E N D
Composing Mappings between Schemas using a Reference Ontology Eduard Dragut, Ramon Lawrence Iowa Database and Emerging Applications (IDEA) Laboratory University of Iowa {eduard-dragut, ramon-lawrence}@uiowa.edu
Outline • Motivation • Integration Approach • Background • Architecture Overview • Ontological Matching • Composing Mappings • Global View Construction • Experimental Results • Future Work and Conclusions
Motivation • Many organizations have pre-existing ontologies that are not suitable as global views but are suitable as reference ontologies to aid integration. • Example: National Cancer Institute (NCI) and National Insitutes of Health (NIH) have caBIG grid prototype which standardizes terminology (EVS, caDSR) and data elements in cancer domain. • Schema-to-ontology matching requires integrators understand only their schema instead of all schemas that they may want to integrate.
Integration Approach Schema matching Schema matching Schema-to- ontology mapping Schema-to- ontology mapping Expression Database NCBI Database Compose & Merge Global View User Queries Reference Ontology Page 4 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence
Background:Ontologies and Integration • Ontologies as the integrated, global view • Carnot project (Collet91) with Cyc ontology (Lenat90) • ONTOBROKER (Decker98), OBSERVER (Mena00) • Tools for semi-automatically merging ontologies • PROMPT (Noy00), Ontobuilder (Gal04) • Use ontologies as matching/integration aids • MOMIS (Beneventano03) using WordNet • Indirect (Xu03), CUPID (Madhavan01), COMA (Do02) • Matching ontologies (Doan02) • “Discovering”ontologies (Madhavan03) • Corpus-based matching
Background:Model Management • Model management as proposed by (Bernstein03) is intended to allow high-level schema operations. • Operators include: Invert, Compose, Match, Merge. • Warning: Semantics of all operators are not yet fully defined and some of them are not completely automatic. • Definitions: • A match is a semantic correspondence between schema elements. • A mapping between schema elements is an expression that relates the elements. • Note that most schema matching systems such as COMA produce matches not mappings.
Architecture Overview • We assume the existence of a pre-existing reference ontology that has been “accepted” in a domain. • The ontology is NOT a global view and may not cover the information in all schemas. It cannot be edited. • Global view construction is a 3-step process: • 1) Independently match each schema to the ontology. • 2) Compose schema-to-ontology matches to produce schema-to-schema mappings. • 3) Merge the schema mappings to produce the global view. • The challenge is to automate this as much as possible.
Benefits of Approach • Even with manual integration there are several benefits to using a reference ontology: • 1) An integrator must only understand their schema and the ontology and not other schemas to be integrated. • 2) Most validation is performed once during schema-to-ontology matching and not for every schema integrated. • 3) Schema-to-ontology matchings can be re-used every time a new schema is integrated into the federation. • Automation can: • 1) Help construct schema-to-ontology matchings. • 2) Perform composition of mappings. • 3) Build a global view from the composed mappings.
Automation Challenges • There are several challenges in automating this process: • 1) Schema matching systems such as COMA are designed for simpler relational schemas. Ontologies must be mapped into a suitable format for use with COMA. • 2) Schema-to-ontology matching is less accurate due to more complicated ontological structure and because the ontology may not model the entire domain or may model it differently. • 3) Composing matchings often results in many false matches which must be handled. • 4) A method for merging schemas using model management primitive operators is required. • **Even with these operators, Merge is not fully automatic.
Background:COMA • COMA (Do02) is a schema matching system that can flexibly combine different match algorithms and re-use match results. • Match algorithms use names, paths, and schema properties in various ways. • The mapping format between two schemas R and S is a triple (r,s,v) where r in R, s in S, and v is the similarity value in [0..1] between elements r and s. • A schema in COMA is represented as a rooted directed acyclic graph. Schema elements are nodes which may be connected by links of different types.
Ontological Matching • The first step is to convert ontologies in OWL/DAML format into COMA’s graph representation format. • Wrote a program that used the JENA parser. • During the conversion: • 1) Explicitly converted a named relationship in the ontology into a node and several edges in graph. • 2) Explicitly encoded attributes inherited over IS-A links since COMA does not support IS-A. • After conversion, COMA would automatically produce a schema-to-ontology match as it would appear to be matching two relational schemas.
Making IS-A Explicit Converting Named Relationships * Also create a single root POOntology as required by COMA. Converting Ontology to a Graph
Ontological Matching:Max versus noMax • One challenge is what should this match look like? • Two choices: • 1) Max - For each schema element, keep the best match with the ontology (if any). • 2) NoMax - For each schema element, keep all the matches that are above the cutoff threshold. • Since Max only generates one match, it is probably the best in semi-automated settings. NoMax will generate many matches which must be filtered out by the user or during composition.
Composing Mappings • Schema-to-ontology mappings must be composed to produce direct schema-to-schema mappings. • Since mappings carry no semantics, two objects are assumed to be identical if they map to the same ontological concept. Composition is performed transitively and is implemented using a natural join. • That is, if element r is similar to o and o is similar to s, then we assume that r is similar to s. • For example: • <postalCode,Zip,0.8> and <Zip, postCode,0.7> can be composed to yield <postalCode,postCode,0.75>. • The similarity values may be combined using various functions, although average is the most common.
S1 S1 O S2 Contact Contact Contact Organization name FirstName CompanyName CompanyName contact LastName Email Email Person Name Name Email FirstName Position Position Position LastName Email S2 Contact Compose FirstName LastName Email Position Composition Example
Global View Construction • One of the possible applications of constructing schema-to-schema mappings in this way is using them to build a global view. • We have given a script in the paper that uses model management operators to compose any number of schema-to-ontology mappings into a single global view for all sources. • Note that this algorithm is not perfect nor fully automatic as the mappings are not perfect and the Merge operator may require human intervention.
Experimental Setup • Matched the 5 sample order schemas: CIDR, Excel, Noris, Paragon, and Apertum used to evaluate COMA. Numbered these schemas 1, 2, 3, 4, and 5. • Created a reference ontology that models some of the domain (but not all of it) and is quite different than the schemas (uses IS-A for example). • Used the matchings specified with COMA as ground-truth. • Evaluation metrics: • Precision - # of correct matches/# of suggested matches • Recall - # of correct matches returned/# total matches • Overall = Recall * (2 - 1 / Precision)
Experiment #1:Schema-to-Ontology Matching • Goal: Evaluate the accuracy of schema-to-ontology matching. • Method: • Automatically convert ontology into COMA format and match each schema with ontology. • Evaluation: • Measured the percent overlap of the schema and ontology. For many schemas, only 60% of their concepts were in the ontology. • Evaluated the precision, recall, and overall measures relative to the number of matches that could be found. • E.g. If overlap was 60% and recall was 50%, then only 30% of all schema elements were matched BUT of all the possible matches, 50% were found.
Experiment #1: Results * noMax is poor for schema 5 as Buyer incorrectly matched to ontology.
Experiment #2:Schema-to-Schema Mappings • Goal: Determine the accuracy of producing schema-to-schema mappings by composing schema-to-ontology matchings. • Method: • Used automatically generated schema-to-ontology matchings and composed them. Evaluated composition result against COMA answers for direct matching. • Evaluated noMax and Max techniques and manual mappings.
Experiment #2: Results (Overall) * 1 <-> 2 is poor because of Street mapping. * 4 <-> 5 is poor because of Buyer mapping.
Experiment #3:Improving Direct Matches • Goal: Determine if the accuracy of producing direct schema-to-schema mappings can be improved by re-using schema-to-ontology matches. • Method: • Generate schema-to-schema mappings by composing schema-to-ontology matchings and then use this as past matching information for COMA. • Allow COMA to perform direct match given this information. • Evaluated noMax and Max techniques and manual mappings.
Experiment #3: Results (Overall) * 1 <-> 2 is poor because of Street mapping.
Discussion and Conclusions • Major findings: • 1) Schema-to-ontology mappings can be constructed with good accuracy (70-80% precision, 60% recall). • 2) The composition of schema-to-ontology matchings produces similar results to direct matching with COMA. • 3) Max has higher precision than noMax but with lower recall. Max is probably best when the user must filter incorrect matches and always saves work. • 4) It is valuable to re-use schema-to-ontology matchings (either automatic or manually constructed) to improve the accuracy of direct matchings. • Major conclusion: There is a benefit to building semi-automatic schema-to-ontology matchings for use in integration and global view construction.
Future Work and Challenges • The major challenge is that the mappings carry no semantics which often results in incorrect matches suggested after composition. • We are currently working on extending the mappings to capture semantics to avoid many of these cases. • The approach is not fully automatic (nor will it ever be). However, most manual work is in the schema-to-ontology matching stage. We need better algorithms and tools to support this matching. • Want to perform experimental evaluation on larger ontologies such as those from NCI. • Issue: Many ontologies are not in suitable form for intermediate mapping with schemas. (just taxonomies)
Composing Mappings between Schemas using a Reference Ontology Eduard Dragut, Ramon Lawrence Iowa Database and Emerging Applicatons (IDEA) Laboratory University of Iowa {eduard-dragut, ramon-lawrence}@uiowa.edu
Extra Slides Extra Slides...
Ontology Conversion Algorithm • 1) Each ontology concept (class) becomes a node in the graph. • 2) For each property (attribute) of a class, add a node to the graph and connect it to its class. • 3) Non-basetype properties (those with domain and range in ontology) are converted by: • 3a) Creating a node in the graph for the relationship. • 3b) Adding an edge from the class domain to this node. • 3c) Adding an edge from the new node to the range class. • Note: Do not currently support properties that have a domain or range that is union/intersection of concepts. • 4) IS-A expanded by graph traversal.
Mapping Composition Challenges Composing N:1 match with 1:N match results in a cross-product Cannot handle these cases as mappings have no semantics.
Global View Construction Script Computes Global View of N Source Schemas (with ontology mappings) Operator GlobalView(ArraySchemas, ArrayMappings, O, n) // ArraySchemas stores the n schemas // ArrayMappings stores the n schema-to-ontology mappings 1. If n <= 0 Then Return empty schema; 2. If n == 1 Then Return ArraySchemas[0]; 3. S1 = ArraySchemas[0]; 4. S2 = ArraySchemas[1]; 5. map1 = ArrayMappings[0]; 6. map2 = ArrayMappings[1]; 7. < S, map > = GlobalView2(S1, S2, map1, map2, O); 8. For (i=2; i <= n-1; i++) 9. S1 = S; 10. map1 = map; 11. S2 = ArraySchemas[i]; 12. map2 = ArrayMappings[i]; 13. < S, map > = GlobalView2(S1, S2, map1, map2, O); 14. end for; 15. Return < S, map >;
Global View Construction Script (2) Computes Global View of Two Source Schemas (with ontology mappings) Operator GlobalView2(S1, S2, O, S1_O, S2_O) 1. S1_S2 = S1_O * Invert(S2_O) 2. < M, S1_M, S2_M > = Merge(S1, S2, S1_S2); 3. M_O = Invert(S1_M) * S1_O + Invert(S2_M) * S2_O; 4. Return < M, M_O >;
Sample Order SchemaExcel XML Schema <?xml version="1.0"?> <Schema name="PurchaseOrder.biz" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="PurchaseOrder" content="eltOnly"> <element type="Header"/> <element type="Items"/> <element type="Footer"/> <element type="InvoiceTo"/> <element type="DeliverTo"/> </ElementType><ElementType name="Items" content="eltOnly"> <AttributeType name="itemCount" dt:type="int"></AttributeType> <attribute type="itemCount"/> <element type="Item" maxOccurs="*" minOccurs="1"/> </ElementType> <ElementType name="Item" content="empty"> <AttributeType name="yourPartNumber" dt:type="string"></AttributeType> <AttributeType name="unitPrice" dt:type="number"></AttributeType> <AttributeType name="unitOfMeasure" dt:type="string"></AttributeType> <AttributeType name="salesValue" dt:type="number"></AttributeType> <AttributeType name="quantity" dt:type="number"></AttributeType> <AttributeType name="partNumber" dt:type="string"></AttributeType> <AttributeType name="partDescription" dt:type="string"></AttributeType> <AttributeType name="itemNumber" dt:type="int"></AttributeType>
Sample Order SchemaExcel XML Schema (2) <attribute type="itemNumber"/> <attribute type="yourPartNumber"/> <attribute type="partNumber"/> <attribute type="partDescription"/> <attribute type="quantity"/> <attribute type="unitOfMeasure"/> <attribute type="unitPrice"/> <attribute type="salesValue"/> </ElementType> <ElementType name="InvoiceTo" content="eltOnly"> <element type="Contact"/> <element type="Address"/> </ElementType> <ElementType name="Header" content="eltOnly"> <AttributeType name="yourAccountCode" dt:type="string"></AttributeType> <AttributeType name="ourAccountCode" dt:type="string"></AttributeType> <AttributeType name="orderNum" dt:type="string"></AttributeType> <AttributeType name="orderDate" dt:type="date"></AttributeType> <attribute type="orderNum"/> <attribute type="orderDate"/> <attribute type="ourAccountCode"/> <attribute type="yourAccountCode"/> <element type="Contact"/> </ElementType>
Sample Order SchemaExcel XML Schema (3) <ElementType name="Footer" content="empty"> <AttributeType name="totalValue" dt:type="number"></AttributeType> <attribute type="totalValue"/> </ElementType> <ElementType name="DeliverTo" content="eltOnly"> <element type="Contact"/> <element type="Address"/> </ElementType> <ElementType name="Contact" content="empty"> <AttributeType name="telephone" dt:type="string"></AttributeType> <AttributeType name="e-mail" dt:type="string"></AttributeType> <AttributeType name="contactName" dt:type="string"></AttributeType> <AttributeType name="companyName" dt:type="string"></AttributeType> <attribute type="contactName"/> <attribute type="companyName"/> <attribute type="e-mail"/> <attribute type="telephone"/> </ElementType>
Sample Order SchemaExcel XML Schema (4) <ElementType name="Address" content="empty"> <AttributeType name="street4" dt:type="string"></AttributeType> <AttributeType name="street3" dt:type="string"></AttributeType> <AttributeType name="street2" dt:type="string"></AttributeType> <AttributeType name="street1" dt:type="string"></AttributeType> <AttributeType name="stateProvince" dt:type="string"></AttributeType> <AttributeType name="postalCode" dt:type="string"></AttributeType> <AttributeType name="country" dt:type="string"></AttributeType> <AttributeType name="city" dt:type="string"></AttributeType> <attribute type="street1"/> <attribute type="street2"/> <attribute type="street3"/> <attribute type="street4"/> <attribute type="city"/> <attribute type="stateProvince"/> <attribute type="postalCode"/> <attribute type="country"/> </ElementType> </Schema>