160 likes | 344 Views
Automatic Data. Ramon Lawrence University of Manitoba umlawren@cs.umanitoba.ca. base Integration. Ken Barker University of Calgary barker@cpsc.ucalgary.ca. Summary. Database integration allows systems within a company or on the WWW to interoperate.
E N D
Automatic Data Ramon Lawrence University of Manitoba umlawren@cs.umanitoba.ca
base Integration Ken Barker University of Calgary barker@cpsc.ucalgary.ca
Summary • Database integration allows systems within a company or on the WWW to interoperate. • Current integration solutions require manual coding which is costly and time-consuming. • Our integration architecture uses XML and a standard dictionary to automate integration. • Database systems are integrated using 3 processes: capture, integration, and query. • The architecture is implemented in a software package called Unity. • Unity is capable of automatic database integration and querying using ODBC. • Applications include data warehouse design, web site integration, and systems integration.
Integration Architecture Integrated Context View X-Spec Editor Standard Dictionary Integration Algorithm Query Processor and ODBC Manager Client Client • Architecture Components: • 1) Integrated Context View • user’s view of integration • 2) X-Spec Editor • stores schema & metadata • uses XML • 3) Standard Dictionary • terms to express semantics • 4) Integration Algorithm • combines X-Specs into integrated context view • 5) Query Processor • accepts query on view • determines data source mappings and joins • executes queries and formats results Multidatabase Layer Subtransactions X-Spec X-Spec Database Database Local Transactions
Integration Processes • The integration architecture consists of three separate processes: • Capture process: independently extracts database schema information and metadata into a XML document called a X-Spec. • Integration process: combines X-Specs into a structurally-neutral hierarchy of database concepts called an integrated context view. • Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) and the results are integrated and formatted.
Architecture Components • The architecture consists of four components: • A standard dictionary (SD) to capture data semantics • SD terms are used to build semantic names describing semantics of schema elements. • X-Specs for storing data semantics. • relational database info. stored and transmitted using XML • stores semantic names to describe schema elements • Integration Algorithm • Identical concepts in different databases are identified by similar semantic names. • Produces an integrated view of all database concepts. • Query Processor • allows the user to formulate queries on the view • translates from semantic names in integrated view to SQL queries and integrates and formats results • involves determining correct field and table mappings • and discovery of join conditions and join paths
Architecture Discussion • The architecture automatically integrates relational schemas into a multidatabase • Desirable properties: • individual mappings - information sources integrated one-at-a-time and independently • integrated view constructed for query transparency - user queries system by semantics instead of structure • handles schema conflicts - including semantic, structural, and naming conflicts • automated integration - integrated view constructed efficiently and automatically • no wrapper or mediator software is required • transparent querying - users issue semantic queries which are translated to SQL by the query processor • no changes are performed on underlying databases
Conclusions • Automatic database integration is possible by using a standard term dictionary and defining semantic names for schema elements. • Integration of data sources has applications to the WWW and construction of data warehouses. • Users are able to transparently query integrated systems by concept instead of structure. • Future Work: • Test the system in large industrial projects. • Improve the Unity implementation and develop an integration component for a web browser. • Increase the efficiency of the query processor. • Allow distributed updates and global updates on all databases.
Background & Motivation
Why is Integration Required? • There are many integration environments: • operational systems within an organization • system integration during company merger • data warehouses, Intranets, and the WWW • Users require information from many data sources which often do not work together. • Companies require a global view of their entire operations which may be present in numerous operational databases for different departments and distributed geographically. • Business-to-Business (B-to-B) and Business-to-Consumer (B-to-C) E-commerce demand integration of web databases with production systems for efficiency and productivity.
What is Integration? • Database integration requires conflict resolution during schema integration. • A schema is a description of the data organization and format in a database. • Schema integration combines database schemas into a unified view of the data. • Constructing an integrated view is difficult because databases will store different types of data, in varying formats, with different meanings, and reference it using different names. Subsequently, the construction of the integrated view must handle the different mechanisms for storing data (structural conflicts), for referencing data (naming conflicts), and for attributing meaning to the data (semantic conflicts). Data is integrated at both the schema level (the description of the data) and the data level (individual data instances).
What is the Current Solution? • SQL and multidatabase query languages: • querying is difficult for numerous databases • Research systems: • do not use standardization so cannot be automated • Industrial standards: • XML, BizTalk, E-commerce portals • apply to limited domains/industries • require standard structures and database changes • Current techniques for database integration have some of these problems: • require integrator to understand all databases • integration process is manual • do not hide system complexity from the user • force changes on the existing database systems
Unity Software
Unity Overview • Unity is a software package that implements the integration architecture with a GUI. • Developed using Microsoft Visual C++ 6 and Microsoft Foundation Classes (MFC). • Unity allows the user to: • construct and modify standard dictionaries • build X-Specs to describe data sources • integrate X-Specs into an integrated view • transparently query integrated systems using ODBC and automatically generate SQL transactions • A screen-shot of Unity editing a standard dictionary is below. • Unity is available for demonstration and distribution.
References • Publications: • Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, January 2000. • Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pages 127-136, Oct. 2000. • Integrating Relational Database Schemas using a Standardized Dictionary, To appear in SAC’2001 - ACM Symposium on Applied Computing, March, 2001. • Sponsors: • NSERC, TRLabs • Further Information: • http://www.cs.umanitoba.ca/~umlawren/