200 likes | 366 Views
Distributed databases. A brief introduction (Figure numbers may not be the same as in the book). Distributed database concepts. Distributed database (DDB) Collection of multiple logically interrelated databases distributed over a computer network
E N D
Distributed databases A brief introduction (Figure numbers may not be the same as in the book) Distributed databases
Distributed database concepts • Distributed database (DDB) • Collection of multiple logically interrelated databases distributed over a computer network • Distributed database management systems (DDBMS) • Software systems managing a distributed database, making distribution transparent to the users Distributed databases
Transparency • Hiding implementation details from the users of the database • Data organization transparency • Location transparency • Use does not depend on location • Naming transparency • Naming is independent from location • Replication transparency • Copies can be kept for availability, performance, and availability • User are unaware of the existence of these copies • Fragmentation transparency • One table is divided into more locations • Horizontal fragmentation • Table divided by rows • Vertical fragmentation • Table divided by columns Distributed databases
Example: Replication and horizontal fragmentation Distributed databases
Reliability and Availability • Two common advantages of distributed databases • Reliability • The probability that a system is running at a certain time point • Availability • The probability that a system is continuously available during a time interval Distributed databases
Advantages of distributed databases • Improved ease and flexibility of application development • Transparency: Developers do not have to know … • Increased reliability and availability • Faults are isolated to a single site • Improved performance • Data localization, means less network traffic • Parallelism • Easier expansion • Easy to add more data, processors, etc. Distributed databases
Types of distributed database systems • Degree of homogeneity • Homogeneous: All local DBMSs run identical software • Heterogeneous: Local DMBSs run different software • Autonomy • Local autonomy: Local site can function as a standalone DBMS • No autonomy: Local site can not function as a standalone DBMS Distributed databases
Classification of distributed databases Distributed databases
Database system architectures Distributed databases
General architecture Distributed databases
Component architecture of distributed databases Distributed databases
Data fragmentation • Which site should store which portion of the database? • Simple fragmentation • Each site has a whole relation • Horizontal fragmentation • Subset of rows in each site • Sometimes based on location • Vertical fragmentation • Subset of columns in each site • Primary key must be in all sites • Mixed / hybrid fragmentation • Horizontal + vertical fragmentation • Described by fragmentation schema Distributed databases
Example fragmentation Distributed databases
Example fragmentation, continued Distributed databases
Data replication • Replication to improve availability • Fully replicated database • All data is replicated to each site • Non replication • All data is stored at exactly one site • Partial replication • Some data is replicated to some sites • Described by replication schema Distributed databases
Distributed query processing • Query mapping • Query mapped from SQL to relational algebra using the global conceptual schema • Localization • Map query on the global schema to separate queries on the local schemas • Using fragmentation and replication information • Global query optimization • Cost = CPU time + I/O time + communication time • Local query optimization • Same as in centralized databases Distributed databases
Distributed transaction management, Two-phase commit protocol (2PC) • Global transaction manager / coordinator • Coordinates the results of local transaction managers. • All local transaction managers must be able to ”commit”, before actually doing the ”commit” • Two-Phase commit protocol (2PC) • Phase 1 • Individual databases tell the coordinator that they have finished transaction • All individual databases have finished: Coordinator sends ”prepare for commit” to all databases • Individual databases answer ”ready to commit” or ”cannot commit” • Phase 2 • If all databases answered ”ready to commit”, coordinator sends ”commit” to all databases • If one (or more) databases answered ”cannot commit”, coordinator sends ”abort” to all databases. • Timeout: if one (or more) databases does not answer within a given amount of time, coordinator sends ”abort”. Distributed databases
Two-phase commit protocol (2PC) • Problems with 2PC • Coordinator crashes: All participating sites are waiting • No way of knowing whether participating sites really got the ”commit” / ”abort” Distributed databases
Three-phase commit (3PC) Distributed databases