270 likes | 404 Views
AutoMed: Automatic generation of Mediator tools for heterogeneous database integration. Alex Poulovassilis (Birkbeck College) Joint project with Peter McBrien (Imperial College) EPSRC Grants GR/N38107, GR/N35915. Integrated Schema. Schema. Schema. Schema. Background.
E N D
AutoMed: Automatic generation of Mediator tools for heterogeneous database integration Alex Poulovassilis (Birkbeck College) Joint project with Peter McBrien (Imperial College) EPSRC Grants GR/N38107, GR/N35915 DIMNet Workshop 7 & 8/10/2002
Integrated Schema Schema Schema Schema DIMNet Workshop 7 & 8/10/2002
Background • In earlier work (ER’97, IS’98, DKE’98) we developed a new framework to support transformation and integration of heterogeneous database schemas. • Our framework consisted of: • a new notion of schema equivalence • a set of primitive schema transformations which can be composed to define unconditional or conditional equivalences between schemas DIMNet Workshop 7 & 8/10/2002
Background • We represent the modelling constructs of higher-level data models (e.g. relational, object-oriented, semi-structured, XML) in terms of a hypergraph data model (HDM) • The HDM common data model provides a unifying semantics for such higher-level modelling constructs DIMNet Workshop 7 & 8/10/2002
Background • Our schema transformations allow constructs from different modelling languages to be mixed within the same intermediate schema (CAiSE’99) • Our schema transformations are automatically reversible, setting up a two-way transformation pathway between pairs of schemas: DIMNet Workshop 7 & 8/10/2002
addClass Series [p|(p,S)category] addClass Doc [p|(p,D)category] addClass Film [p|(p,F)category] addClass Prog [p|(p,c)category] DIMNet Workshop 7 & 8/10/2002
addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S)category] addClass Doc [p|(p,D)category] addClass Film [p|(p,F)category] addClass Prog [p|(p,c)category] DIMNet Workshop 7 & 8/10/2002
addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S)category] addClass Doc [p|(p,D)category] addClass Film [p|(p,F)category] addClass Prog [p|(p,c)category] delRel category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries] DIMNet Workshop 7 & 8/10/2002
delSubClass Film Prog delSubClass Doc Prog delSubClass Series Prog delClass Series [p|(p,S)category] delClass Doc [p|(p,D)category] delClass Film [p|(p,F)category] delClass Prog [p|(p,c)category] addRel category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries] DIMNet Workshop 7 & 8/10/2002
addConstraint subset Film Prog addConstraint subset Doc Prog addConstraint subset Series Prog addNode Series [p|(p,S)category] addNode Doc [p|(p,D)category] addNode Film [p|(p,F)category] addNode Prog [p|(p,c)category] delEdge category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries] delNode Programme Prog delNode Category [F,D,S] DIMNet Workshop 7 & 8/10/2002
delConstraint subset Film Prog delConstraint subset Doc Prog delConstraint subset Series Prog delNode Series [p|(p,S)category] delNode Doc [p|(p,D)category] delNode Film [p|(p,F)category] delNode Prog [p|(p,c)category] addEdge category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries] addNode Programme Prog addNode Category [F,D,S] DIMNet Workshop 7 & 8/10/2002
Query and Data Translation • These pathways can thus be used to automatically translate data and queries between schemas (ER’99) • From a pathway T:S –> S’ we: • compose the queries in the add steps to derive a definition of each construct in S’ as a view over S, and • compose the queries in the del steps to derive a definition of each construct in S as a view over S’ DIMNet Workshop 7 & 8/10/2002
Query and Data Translation • Thus Prog = [p | (p,c)category] Film = [p|(p,F)category] Doc = [p|(p,D)category] Series = [p|(p,S)category] category = [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries] • These view definitions can then be used to automatically translate data and queries between S and S’ DIMNet Workshop 7 & 8/10/2002
Both-As-View integration • Our schema transformation pathways capture at least the information available from global-as-view(GAV) or local-as-view (LAV) • We discuss this in a forthcoming paper (ICDE’03) and term our integration approach both-as-view (BAV) • In particular, we discuss how • GAV and LAV view definitions can be derived from a BAV specification • a BAV specification can be partially derived from a set of GAV or LAV view definitions DIMNet Workshop 7 & 8/10/2002
Schema Evolution • Unlike GAV and LAV, our framework readily supports the evolution of both local and global schemas (CAiSE’02, ICDE’03) • The first step is to define the evolution of the global or local schema as a schema transformation pathway from the old to the new schema • There is then a systematic way of evolving, as opposed to re-generating, the transformation pathways – and perhaps the global schema in the case of a local schema evolution DIMNet Workshop 7 & 8/10/2002
Schema Evolution • In particular (see CAiSE’02 and ICDE’03 for details): • if the evolved schema is semantically equivalent to the original schema, then the transformation network can be repaired automatically • if the evolved schema is a contraction of the original schema, the transformation network can again be repaired automatically • if the evolved schema is an extension of the original schema, then domain knowledge may be required (but again the network is evolved rather than regenerated) DIMNet Workshop 7 & 8/10/2002
The AutoMed Project • The aims of the AutoMed project are to investigate: • how our theoretical framework can be practically applied real data integration problems • how much of a mediator’s global query processing functionality can be automatically generated from our transformation pathways • evolutionary and heuristic techniques for schema improvement and global query optimisation DIMNet Workshop 7 & 8/10/2002
AutoMed Architecture Schema and Transformation Repository Schema Transformation and Integration Tool Global Query Processor Global Query Optimiser Model Definitions Repository Model Definition Tool Schema Evolution Tool DIMNet Workshop 7 & 8/10/2002
Query Processing and Optimisation • We are handling query language heterogeneity by translation into/from a functionalintermediate query language – IQL; Edgar Jasper (BNCOD’02 poster, BNCOD’02 summer school paper) • A query Q expressed in a high-level query language on a global schema S is first translated into IQL • GAV view definitions are derived from the transformation pathways from the local schemas to S, and are used to reformulate the query into an IQL query over the local schema constructs • A LAV query processing approach would also be possible DIMNet Workshop 7 & 8/10/2002
Query Processing and Optimisation • Query optimisation and query evaluation then occur • Specific issues for query optimisation in AutoMed include: • optimising the view definitions derived from the transformation pathways, and • handling heterogeneous modelling constructs appearing within these view definitions • For query evaluation, wrappers will undertake translation of IQL sub-queries into the local query language, and translation of results back into the IQL type system. Further post-processing is possible. DIMNet Workshop 7 & 8/10/2002
XML Data Sources • As well as integration of structured data sources, we have done some preliminary work on translating and integrating XML data (CAiSE’01) • We have defined a representation of XML in terms of the nodes, edges and constraints of the HDM • We capture the ordering of XML elements by an order node and a hyperedge to it from the edge representing the parent-child relationship DIMNet Workshop 7 & 8/10/2002
Translating XML into HDM <customer name=“Jones”> <account number=“A14”/> <account number=“B37”/> </customer> <customer name=“Smith”> <account number=“C514”/> <account number=“D438”/> </customer> root order customer name order account number DIMNet Workshop 7 & 8/10/2002
XML Data Sources • We have also defined a set of primitive transformations on XML (in terms of the underlying transformations on the equivalent HDM representation) • XML documents are then translated into a simple ER representation, which allows them to be integrated with each other and with other structured data sources • The above work has been implemented by Tanvir Faqueer • He is now looking at automatic or semi-automatic transformation and integration of the ER models arising from XML documents DIMNet Workshop 7 & 8/10/2002
Unstructured Text Sources • We are also working on extracting structure from unstructured text sources– Dean Williams • The aim here is to integrate information extracted from unstructured text with structured or semi-structured information available from other sources • We are using existing IE technology (the GATE tool from Sheffield) for text annotation. Natural language and domain ontologies will be used to extend these annotations • The extracted information will be matched with existing information in order to derive new facts and perhaps new global schema constructs DIMNet Workshop 7 & 8/10/2002
Materialised integration • Finally, as well as virtual integration of data sources, we are also investigating using the AutoMed framework for materialised integrationi.e. a data warehousing approach • In particular, we are looking at incremental view maintenance and data lineage tracing using the AutoMed schema transformation pathways – Hao Fan DIMNet Workshop 7 & 8/10/2002