370 likes | 505 Views
Multidatabase and Distributed Manipulations Part 1. Witold Litwin. Plan. Introduction Technical problems Origin of the concept Approach : Centralized DB (ANSI-SPARC) Approach : DDB (top-down) Approach : Global Schema (bottom-up) Reference architectures
E N D
Multidatabase and Distributed ManipulationsPart 1 Witold Litwin
Plan • Introduction • Technical problems • Origin of the concept • Approach : Centralized DB (ANSI-SPARC) • Approach : DDB (top-down) • Approach : Global Schema (bottom-up) • Reference architectures • Multidatabase architecture • Federated architecture • Autonomy, semantic heterogeneity , common model
Multidatabase Model • Single database model ANSI-SPARC : • The real universe is modeled as one DB • The real universe is modeled as multiple dbs • Autonomous • Semantically heterogeneous • Manipulated through a multibase language • "Multidatabase interoperability". Litwin, W. Abdellatif, A. Multidatabase systems: An advanced solution for global information sharing. Hurson, A., R., Bright, M., W., Pakzad, S., H., (Ed.). IEEE press, 1993
Multidatabase Model Autres BDs sur Internet Rest. Cours & étudiants Mes-amis Bibliothèque Privé Paris 9 Employés Folio Cine Teletel
Problems • Reference architecture • Semantic heterogeneity in presence of local autonomy • Common data model • Fonctions of MDB langage • Transactions • Protocols & standards • Performance
Reference Architectures • Multidatabase architecture • Generalization of the ANSI-SPARC architecture • Federated dbs architecture • Generalization of the federated DB architecture • Others
ANSI-SPARC Architecture Centralized Integrated BD (1960-70) ES - External Schema CS - Conceptual Schema PS - Physical or Internal Schema
Distributed DB • Origin of the concept (years 1970) • WAN development ( 20 kb/s) • Overload of centralized dbs
Ddb • Idea : distribution of functions other than local communication ("top-down" approach) • Which ones ? • Distr. Execution (OS) • File access • The DB • Then, what data model for CS ? • Network (codasyl) ? • Relational
Relation Fragmentation Hotels (H#, Ville, Cat, #Chambres) 1 Fragment (H#, Cat, #Chambres) (H#, Ville)
Problems • GS scalability • GS utility for a local user • Query performance (bad case) • Data migration from ldbs • IMS, IDMS, Socrate • Local applications
Problemes With the "Bottom-up" Approach ES ES • Creation of GS • Semantic heterogeneity • Time for the integration / local restructuring autonomy • In other words: scalability of GS • Updates • Performance • Heterogeneos views GS CS CS CS PS PS PS GS Approach ("bottom-up")
MDB Architecture Absence of (GS) ES ES • User may have data in multiple dabs ANSI-SPARC compatibles • One may face multiple css • In general, it will be impossible to create a GS GS CS CS CS PS PS PS GS Approach ("bottom-up")
MBD Architecture Concept of the MDB Language • Language for definition and manipulationof collections of dbs (multidatabases) • Definition of MDB ESs • Perhaps GSs • Definition of MDB dependancies • Semantics, integrity, security, manipulation... • Formulation of MDB queries (explicitly) • Referencing DB names • With MDB joins... • Find in DB michelin and in DB gaumont all restaurants '**' and cinemas at the same street
MBD Architecture Multibases • A multidatabase (MDB) is a collection of dbs with MDB language • E.G. MSQL • A collection of dbs without MDB langage is not an MDB, but just a collection of dbs • As a collection of flat files (tables) without a DB language , SQL for example, is not a DB
Potential Multibases Other DBs sur Internet Rest. Cours & étudiants Mes-amis Bibliothèque Privé Paris 9 Employés Folio Cine Teletel
MDB Architecture Concept of Internal Logical Sublayer • The legacy data models can be heterogeneous • Different SQL dialects • Relational, hierarchical, network • OO and object-relational • It is preferable to have one model at MDB layer • One needs a sub-layer for translations • Also local DBA may wish not to show some of the data at MDB layer • Solution: ILS - internal logical schema • Unknown of ANSI-SPARC • Called also gateway
Multibase Architecture Result of All This : (W. Litwin & Al, Années 1980) Req. MDB ES multibase
Federated Architecture (Hambiger & Mcleod, Années 1980) • Every DB should be autonomous • In general, there will be no GS • Global integration is against the autonomy • The dbs used in common should form a federation of autonomous dbs • Every DB in a federation should be provided with three schemes: • ES: export schema • IS: import schema • PS: private schema : for all the private data, of ES and of IS included • There should be some federation dictionary (FD)
Comparison • MDB architecture focuses on the concept of MDB language • Federated architecture focuses on the concept of autonomy • No MDB language • But there is the notion of autonomy also in MDB arch. • MDB architecture is + decentralized • No equivalent of FD • Several DSs • Both architectures are popular
DB Autonomy(localautonomy) • Capability to control local DB data by the DBA • Naming • Value Types • Data Structures • Physical Structures • Query Execution • Security • Priority to local queries
Multidatabase Autonomy • Capability to controle multiple DBs by a DBA • Same aspects as for the local autonomy • Naming... • May create a conflict with a DB autonomy • Priority to local autonomy B1 B2 B3
Semantic Heterogeneity • Differences in representations of the same real properties • Names André Andrew • Value Types • Representations • units of measure cm/s pied/h • precision 1 g 1 Kg • Data Structures One table in 2 NF several tables in 3 NF
Solutions (partial) • Schemas + descriptive • Protocols + descriptive • Data Dictionaries • Thesaurus • Automatic Representation Conversions • Automatic Unit Conversion • Implicit Joints • Higher level models and manip. languages • IDL (Krishnamourthy, Litwin, Kent, ACM-Sigmod 92
Common Models • Ext. Relational • EDA-SQL • MSQL (research) • ODBC Microsoft SQL • Object-Relational • SQL 3 • CCS language for inf. retr. DBs • Numerous gateways towards SQL • IMS SQL • Codasyl SQL
UniSQL/M UniSQL/M IMS UniSQL Sybase Oracle
Other gateways UniSQL/M IMS UniSQL Sybase Oracle
Yet other gateways UniSQL/M IMS UniSQL Sybase Oracle EDA-SQL
Conclusion • Modern DBMSs are in general MDBMSs • UniSQL/M, Oracle, Sybase, MsAccess, SQL Server, InterBase, DB2... • MDB access requires new functions for the management of • autonomy • semantic heterogeneity • physical distribution of data
Conclusion • technical solutions are basd on : • new reference architectures • multidatabase architecture • federated architecture • common data models • relational and object-relational • Gateways in rapid development • Any DBMS towards any other using ODBC, JDBC… • Relational or XML wrappers
Conclusion • MDB Languages • MSQL et SQL-x ; x > 2 • New transaction models • Protocoles et Standards • ODBC, JDBC… • All this is dealt with more in depth • later on • in the class books & several others
Exercises & Research Problems • All these already in the text • Difference between the concepts of a DB, DDB,MDB and FDB. • What does it mean « reference architecture », as ANSI-SPARC for example ? • Differences between the architectures: « top-down » « bottom-up », multibase et federated. • Comment on the actual architecture of federated DBs in DB2 V. 6 and later (see DB2 Help or RedBooks at the IBM web site). • Comment on the actual architecture of federated DBs in SQL Server 2000 (see SQL Server 2000 Help or the papers at the MS web site). • Design SQL SELECT statements realizing the fragmentation of the Hotels DB. • Can you propose then how to dealwith the updates ? • Comment on the concept of ILS, of gateway and of mediator • What multidatabase common model is most used today ? • Comment the concept of local autonomy (what, why, how) • Provide examples of various types of semantic heterogeneity • Prove that usual associability of equijoints does not hold if units of measure of values to join may have different precisions • What are consequences for relational DBMSs ? • Propose an extension to SQL for units of measure and the corresponding query optimization (PH.D Thesis).