300 likes | 320 Views
Learn about the fundamentals, objectives, implementations, and challenges of distributed databases. Explore the benefits of local autonomy, location independence, fragmentation, and replication independence.
E N D
Outline • generalities • objectives • problems
application application server server server application communication network application application application application Introduction DBMS in its own right
Introduction • distributed database = collection of connected sites • each site is a DB in its own right (1) • has its own DBMS and its own users • operations can be performed locally as if the DB was not distributed • the sites collaborate (transparently from the user’s point of view) • the union of all DBs = the DB of the whole organisation (institution) • (oppose to (1)) • physical or logical distribution • strict homogeneity (assumption)
Motivation • advantages • matches the structure of the organisation • example • efficiency of processing • stored closely to where it is being used • increased accessibility • remote DBs can be accessed • disadvantage • complexity
Implementations (systems) • commercial • ORACLE (Oracle Corporation) • INGRES/STAR (Ask Group Inc. Ingres Division) • DB2 (IBM) • they all provide some sort of features for distributed databases
Fundamental principle • a distributed DB system should look to the user exactly as a non-distributed DB system
Objectives • local autonomy • no reliance on central site • location independence • fragmentation independence • replication independence • distributed query processing • distributed transaction management
Objectives are: • not independent from each other • not exhaustive • sometimes contradicting • different degree of importance (for the user)
Local autonomy • all operations at a certain site are fully controlled by that site • not achievable (why?) • therefore, autonomy should be achieved to the maximum extent possible • local data is locally owned and managed • local data belongs to the local server even if it is accessible from other servers • security, integrity, ..., are in the responsibility of the local server
No reliance on a central site • reasons • bottle-neck • vulnerability • conclusion • all sites must be equal
Location independence • users should not have to know where data is physically stored • why do you think this is needed? • think of application programs • what does this objective look like?
Data fragmentation • data fragmentation • if a relation can be divided into “fragments” for storing purposes • motivation: performance - data is stored where it is mostly used • definition • fragment = any subrelation derivable via restriction or projection
Data fragmentation - example FRAGMENT Emp INTO Lo_Emp AT SITE ‘London’ WHERE Dept_id = ‘Sales’ Le_Emp AT SITE ‘Leeds’ WHERE Dept_id = ‘Dev’ ;
Fragmentation independence / transparency • users should perceive data as if it were not fragmented • why? • it is the optimiser’s responsibility to determine which fragments need to be physically accessed • similar to views • retrieving • updating (JOIN and UNION views)
Data replication • copies of the same fragment can exist at different sites • reasons • better availability • better performance • disadvantage • update propagation
Replication independence / transparency • users should not have to be aware of data replication • it is the optimiser’s responsibility to choose which replica to use • commercial systems • not full support for replication independence (update problems) - primary copy
Distributed query processing • the system must have set level operators • one record at a time - too many messages (traffic) • relational - indicated • optimisation • particularly relevant! • find best way to move data across the network
Problems • aim • minimise network utilisation • occur • due to network utilisation query processing catalogue management update propagation recovery control concurrency control
Query processing • in a distributed environment • query execution is distributed • query optimisation is distributed • global optimisation • local optimisation • example • query on relation R issued at site X • part of R, say Ry, stored at Y • part of R, say Rz, stored at Z • where is the query going to be executed?
Catalogue management • what ‘other’ data does the catalog include? • fragmentation, replication ... • where should the catalogue be stored • centralised • fully replicated • loss of autonomy - update propagation! • partitioned • non local operations - very expensive! • combination of first and third
Central Catalogue • all updates, including local updates, have to be recorded in the central catalogue • disadvantages: • bottleneck • conflicts with the “no reliance on a central site” objective
Fully Replicated Catalogue • the entire database catalogue (not only the local one) is stored at each site • every time an update is made, it has to be recorded at each site • disadvantages • loss of local autonomy • time and network traffic consuming updates
Update propagation • problems because of replication • data might become less available • primary copy scheme • one copy is designated primary copy (unique) • primary copies exist at different sites (distributed) • an update is logically complete if the primary copy has been updated • the site holding the primary copy would have to propagate the updates • violation of local autonomy
Concurrency control • locking • overhead - increased number of messages • primary copy strategy • locking only the primary copy • the primary copy’s site will propagate the update • loss of autonomy (severely) • global deadlock • two interlocked (waiting for each other) sites • cannot be detected using the wait-for graph - therefore, communication overhead
Conclusion • generalities • objectives – in brief • problems – in brief