280 likes | 396 Views
Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas. Qi He Tok Wang Ling Dept. of Computer Science School of Computing National Univ. of Singapore. Outline. Schema integration – background Schematic discrepancy
E N D
Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National Univ. of Singapore
Outline • Schema integration – background • Schematic discrepancy • Representation of meta information in ER schemas • Resolution of schematic discrepancy in schema integration • Related work • Conclusion
Schema Integration • In DB integration, produce an integrated view which provides a unified access to heterogeneous data in source schemas. • In DB design, produce a global schema of a proposed DB by integrating user views in DB design.
Challenges in schema integration • Many types of conflicts among different source schemas need to be resolved in schema integration: • Naming conflicts • Domain mismatch • Structural conflicts • Cardinality conflicts • Local constraints vs global constraints (e.g. local vs global functional dependencies) • Schematic discrepancy …
Schematic Discrepancy • Schematic discrepancyoccurs when a metadata in one database corresponds to attribute values in the other. • An example (next page) • monthsandsupplier numbers (i.e., S1, …, Sn) are modeled differently as attributevaluesorschemalabels (in general, metadata which will be introduced later)in databases DB1, DB2, and DB3.
Motivation Example price is an attribute of the ternary relationship type PMS PM is a relationship type between product and month
Contexts of schema constructs • Conceptual modeling is always done within a particular context which is explicitly represented as a set of meta attributeswith values (called metadata). • Meta attributes with values specify the conditions satisfied by the instances of a schema construct (i.e., entity type, relationship type, or attribute).
Ontology • A representational vocabulary for a shared domain of discourse which includes the definitions of entity types, relationship types, and attributes. • We use an ontology to describe the meta information of the ER schemas of the supply example: • Entity types: product, supplier, month • Attributesof entity types: p#, pname, s#, month • Relationship types: PMS (a ternary supply relationship type among product, month and supplier) PM(a binary relationship type between product and month)PMis a projectionof PMS. • Attributes of relationship types:price (an attribute of PMS)
Example of Context • In DB2, the entity type JAN_PROD is represented as: JAN_PROD = PM [month = ‘jan’] where PMand monthare resp. a relationship type and an entity type from the ontology. It means that JAN_PROD is derived from the product-month binary relationship type (i.e. PM) when the month value is ‘jan’. monthis a meta-attribute and jan the metadata of JAN_PROD.
Inheritance of Context • Context could be specified at 4 levels of • Databases • Entity types • Relationship types • Attributes • The context of a higher level schema construct could be inherited by a lower level schema construct. The inheritance hierarchy of contexts is: relationship type attribute of relationship type database entity type attribute of entity type
Example of context inheritance • In DB2, the attribute S1_PRICEof the entity type JAN_PROD is represented as: S1_PRICE=price [supplier=’s1’, inherit ALL] S1_PRICE inherits ‘all’, i.e. the contextmonth=’jan’, from the entity type JAN_PROD. The representation means that each value of S1_PRICE of the entity type JAN_PROD is a price of a product supplied by supplier s1 in the month of jan.
Resolution of schematic discrepancy in the integration of ER schemas
Basic Idea: Remove the contexts of schema constructs by transformingmeta-attributes into entity types. • Only meta-attributes causing schematic discrepancy need to be transformed. • Schema transformation should keep the semantics (information and constraints) of source schemas.
Resolve schematic discrepancy for entity types, relationship types, attributes of entity types and attributes of relationship types in order (the order conforms to the hierarchical order of context inheritance). • The context at database level is handled in the entity types which inherit it.
An example • Transforming DB2 into DB1 in 2 steps • Step 1: Resolve discrepancies for the entity types JAN_PROD, …, DEC_PROD • Step 1.1: Transform meta-attributes into entity types • Step 1.2: Merge equivalent entity types, relationship types and attributes • Step 2: resolve discrepancies for the attributes S1_PRICE, …, SN_PRICE
PM is a relationship type between product and month • Step 1.1: Transform the meta-attribute month of the entity type JAN_PROD (the other entity types are similar): • Construct an entity type MONTH to model the meta info • JAN_PROD becomes PROD after removing the context • Construct a relationship type PM to relate PROD and MONTH • Attributes S1_PRICE, …, SN_PRICE are moved to PM, as they inherit the context (i.e., the month) of the entity type JAN_PROD.
Step 1.2: Merge the equivalent entity types, relationship types and attributes which refer to the same ontology names. Note the domains of the MONTH attributes are united.
An example (cont.) • Transforming DB2 into DB1 in 2 steps • Step 1: Resolve discrepancies for the entity types JAN_PROD, …, DEC_PROD • Step 2: Resolve discrepancies for the attributes S1_PRICE, …, SN_PRICE • Step 2.1: Transform meta-attributes into entity types. • Step 2.2: Merge equivalent entity types, relationship types and attributes. • Step 2.3: Remove redundant relationship types.
price is an attribute of the relationship type PMS • Step 2.1: Transform the meta-attribute supplier of the attribute S1_PRICE (the other attributes are similar): • Construct an entity type SUPPLIER to model the meta information. • Construct a relationship type PMS to relate PROD, MONTH and SUPPLIER. • S1_PRICE becomes PRICE after removing the context, and is moved to PMS.
Step 2.2: Merge the equivalent entity types, relationship types and attributes. The domains of the S# attributes are united.
Step 2.3: Remove the redundant relationship type PM that is a projection of PMS.
Semantic preservation • Our solution to schematic discrepancy preserves the semantics of source schemas in schema transformation: • Information preservation. The instance of a schema can be losslessly converted into the instance of another schema, and conversely. • Constraint preservation. Cardinality constraints of ER schemas can be preserved in schema transformation, but in different forms in the source and transformed schemas (an example is given in the next page).
Constraint Preservation (E.g.) • Functional dependency (FD) is preserved in the transformation from DB2 to DB1. • Suppose in each entity typeJAN_PROD, …, DEC_PROD of DB2, the FD holds: P# {S1_PRICE, …, SN_PRICE} • In DB1, the FD is preserved, but in a different form: {P#, S#, MONTH} PRICE • In [3], we gave inference rules to derive FDs in schema transformation. [3] Qi He and Tok Wang Ling: Extending and inferring functional dependency in schema transformation. CIKM, 2004.
Related work • The definition of context as a set of meta-attributes with values is originally adopted in [2, 9]. • They defined context at the attribute level only. • We consider contexts at the levels of database, entity types and attributes, as well as the inheritance of context. [2] C. H. Goh, S. Bressan, S. Madnick, and M. Siegel: Context interchange: new features and formalisms for the intelligent integration of information. TOIS, 1999 [9] E. Sciore, M. Siegel, A. Rosenthal: Using semantic values to facilitate interoperability among heterogeneous information systems, TODS, 1994
Related work • Existing work in schema integration focused on the resolution of structural conflicts [1, 7] and constraint conflicts [6, 8]. • Our solution to schematic discrepancy complements those works. • The resolution of schematic discrepancy is followed by the resolution of other conflicts. [1] C. Batini, M. Lenzerini: A methodology for data schema integration in the Entity-Relationship model. IEEE Trans. on Software Engineering, 10(6), 1984 [6] Mong Li Lee, Tok Wang Ling: Resolving constraint conflicts in the integration of entity-relationship schemas. ER, 1997 [7] Mong Li Lee, Tok Wang Ling: A methodology for structural conflicts resolution in the integration of entity-relationship schemas. Knowledge and Information Sys., 5, 2003 [8] M. P. Reddy, B.E.Prasad, Amar Gupta: Formulating global integrity constraints during derivation of global schema. Data & Knowledge Engineering, 16, 1995
Related work • Schematic discrepancy in relational model is solved in some multidatabase languages [4, 5]. • They solved a special problem in schematic discrepancy: they transform relation names or attribute names to attribute values, or converse. • They did not consider the constraint issue in schema transformation. • Our work solves a general problem, and preserves cardinality constraints of ER schemas in the schema transformation. [4] R. Krishnamurthy, W. Litwin, W. Kent: Language features for interoperability of databases with schematic discrepancies. SIGMOD, 1991 [5] L. V. S. Lakshmanan, F. Sadri, S. N. Subramanian: SchemaSQL—an extension to SQL for multidatabase interoperability. TODS, 2001
Conclusion • ER model supports cardinality constraints, which facilitates the derivation of constraints in schema transformation and integration. • Context is used to explicitly represent meta information of entity types, relationship types and attributes in ER schemas. • Schematic discrepancy is resolved by removing context. • The solution to schematic discrepancy preserves information and constraints.
Reference [1] C. Batini, M. Lenzerini: A methodology for data schema integration in the Entity-Relationship model. IEEE Trans. on Software Engineering, 10(6), 1984 [2] C. H. Goh, S. Bressan, S. Madnick, and M. Siegel: Context interchange: new features and formalisms for the intelligent integration of information. ACM Transactions on Information Systems, 17(3), 1999, pp 270-293 [3] Qi He and Tok Wang Ling: Extending and inferring functional dependency in schema transformation. CIKM, 2004. [4] R. Krishnamurthy, W. Litwin, W. Kent: Language features for interoperability of databases with schematic discrepancies. SIGMOD, 1991, pp 40-49 [5] L. V. S. Lakshmanan, F. Sadri, S. N. Subramanian: SchemaSQL—an extension to SQL for multidatabase interoperability. TODS, 2001, pp 476-519 [6] Mong Li Lee, Tok Wang Ling: Resolving constraint conflicts in the integration of entity-relationship schemas. ER, 1997, pp 394-407 [7] Mong Li Lee, Tok Wang Ling: A methodology for structural conflicts resolution in the integration of entity-relationship schemas. Knowledge and Information Sys., 5, 2003, pp 225-247 [8] M. P. Reddy, B.E.Prasad, Amar Gupta: Formulating global integrity constraints during derivation of global schema. Data & Knowledge Engineering, 16, 1995, pp 241-268 [9] E. Sciore, M. Siegel, A. Rosenthal: Using semantic values to facilitate interoperability among heterogeneous information systems, TODS, 19(2), 1994, pp 254-290