Distributed DBMSs – Concepts and Design

Distributed DBMSs – Concepts and Design Chapter 24 in Textbook

Overview • Concepts. • What is a distributed DBMS? • Distributed Processing. • Homogeneous vs. Heterogeneous. • Functions of a DDBMS. • Components of a DDBMS. • Advantages and Disadvantages. • DDBMS Design. • Fragmentation. • Replication. • Allocation. • DDBMS Transparencies. • Date’s 12 Rules for a DDBMS.

Concepts • Centralized DBMS systems with a single logical database located at one site under the control of a single DBMS. • Distributed DBslogically interrelated collection of shared data physically distributed over a computer network. • Applications can be classified into: • Local applications. • Global applications.

Distributed DBMS • Distributed DBMS The software system that: • manages the distributed DBs. • makes distribution transparent to users. • allows users to access data on their own site as well as remote sites. • Transparent distribution is the fundamental principle of DDBMS.

Characteristics of DDBMS • A collection of logically related shared data. • The data is split into a number of fragments. • Fragments may be replicated. • Fragments/replicas are allocated to sites. • The sites are linked by a communications networks. • The data at each site is under the control of a DBMS. • The DBMS at each site can handle local applications. • Each DBMS participates in at least one global application.

Distributed DBMS Topology Site 1 Site 4 Site 2 Computer Network Site 3 Data itself is distributed and access to it can be local or remote.

Distributed Processing Site 1 Site 4 Site 2 Computer Network Site 3 Data itself is centralized but access to it can be local or remote.

Homogeneous vs. Heterogeneous DDBMS • Homogenous system: all sites use the same DBMS product. • Heterogeneous system: sites may run different DBMS products & data model. • Possible differences between data in different DBS: • Data type difference. • Value difference. • Semantic difference.

Functions of a DDBMS • Provide access to remote sites and allow transfer of queries & data among the network’s site. • Store data distribution details. • Distributed data processing. • Security control. • Concurrency control. • Recovery services.

Components of a DDBMS Global system catalog Site 1 DDBMS DC LDBMS GSC DDBMS DC Computer Network GSC DB Data communication component Site 3

Advantages of DDBMS • Reflects organizational structure. • Improve sharability & local autonomy. • Improved availability. • Improved reliability. • Improved performance.

Disadvantages of DDBMS • Complexity. • Cost. • Security. • Integrity control more difficult. • Lack of standards. • Lack of experience. • DB design more complex.

Distributed Relational DB Design • We have a group of tables and we want to distribute them between a group of sites. • Consists of 3 major steps: • Fragmentation divide a relation into a number of sub-relations (fragments). (Horizontal & vertical). • Replication make a copy of a fragment. • Allocation decide where (which site) each of the fragments and replicas are to be stored.

Distributed Relational DB Design • When we fragment, replicate and allocate, we try to achieve: • Locality of reference. • Improved reliability and availability. • Good performance. • Balanced storage capacities and costs. • Minimal communication costs.

Rules of Fragmentation • Completeness: Nothing (rows or columns) gets lost while we fragment. • Reconstruction: We can get back the original table after we fragmented it. • Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).

Types of Fragmentation Horizontal fragmentation Vertical fragmentation Mixed fragmentation

Original PropertyForRent Table

Horizontal Fragmentation Based on type of property. P1: Type=‘House’ (PropertyForRent) P2: Type=‘Flat’ (PropertyForRent) Fragment P1 Fragment P2 BranchNo

Original Staff Table

Vertical Fragmentation S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffno,fname,lname,BranchNo(STAFF) Fragment S2 Fragment S1

Mixed Fragmentation – Vertical then Horizontal Fragment S2.1 S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffoo,fname,lname,BranchNo(STAFF) StaffNo LName BranchNo FName S2.1:  BranchNo=‘B005’ (S2) S2.2:  BranchNo=‘B003’ (S2) S2.3:  BranchNo=‘B007’ (S2) Fragment S2.2 StaffNo LName BranchNo Fragment S1 Fragment S2.3 LName BranchNo FName

Derived Horizontal Fragmentation • Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2. • It is not explicitly specified in design but implied from fragmentation of T2. • T1 (child) has a foreign key that belongs to T2 (parent). • Relationship between T1 and T2 either 1-to-1 or Many-to-1. • Use Semi-join operation: 

Derived Horizontal Fragmentation • You were required by the design to horizontally fragment Staff table. • S1:  BranchNo=‘B003’ (Staff) • S2:  BranchNo=‘B005’ (Staff) • S3:  BranchNo=‘B007’ (Staff)

Derived Horizontal Fragmentation Fragment S1 Fragment S2 Fragment S3

Derived Horizontal Fragmentation • After we fragmented Staff, we found out that there is a table related to it, PropertyForRent. • Because Staff is now fragmented, it makes sense to fragment PropertyForRent too. Staff PropertyForRent handles 1 N S1:  BranchNo=‘B003’ (Staff) S2:  BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si S3:  BranchNo=‘B007’ (Staff)

Original PropertyForRent Table

Derived Horizontal Fragmentation Fragment P1 Fragment P2 Fragment P3

Transparencies in a DDBMS • 4 main transparencies: • Distribution Transparency. • Fragmnetation. • Location. • Replication. • Local Mapping. • Naming. • Transaction Transparency. • Performance Transparency. • DBMS Transparency.

1. Distribution Transparency • Allows the user to perceive the DB as a single, logical entity. • Types: • Fragmentation:the user does not need to know the data is fragmented. • Location: the user does not need to know the location of fragments. • Replication: the user does not need to know the fragments are replicated. • Local Mapping: the user specifies the fragment and its location. • Naming:DDBMS makes sure every item name is unique. • Consider the distribution of the STAFF relation: • S1: staffno,Position,sex,DOB, Salary(STAFF) • S2: staffno,fname,lname,BranchNo(STAFF) • S21:  BranchNo=‘B003’ (S2) • S22:  BranchNo=‘B005’ (S2) • S22:  BranchNo=‘B007’ (S2)

a. Fragmentation Transparency • Highest level of distribution transparency. • The user does not need to know that the data is fragmented. • User treats DDB like a centralized DB. • The database access are based on the global schema. • Fragmentation of the data can be changed without impacting the user. • Example: SELECTFname, Lname FROM Staff WHERE position = ‘Manager’;

b. Location Transparency • The middle level of distribution transparency. • The user must know that the data is fragmented but still does not need to know the location of the data. • Data location can be changed without impact on the user. • Example: SELECT Fname, Lname FROMS21 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS22 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS23 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

c. Replication Transparency • User unaware of replication and location but knows that data is fragmented. • On the same level with location transparency.

d. Local Mapping Transparency • The lowest level of distribution transparency. • The user knows that the data is fragmented and the location of the data. Example: SELECT Fname, Lname FROMS21AT SITE 3 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS22AT SITE 5 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS23AT SITE 7 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’)

e. Naming Transparency • Each item in distributed database must have a unique name. • DDBMS must ensure that no two sites violate that. • Solutions • Create a central name server. • Bottleneck. • against local autonomy. • Prefix an object with the identifier of the site. • loss of distribution transparency.

2. Transaction Transparency • All transactions must ensure the consistency and integrity of the DDB. • Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions. • Even if transaction is split, atomicity has to be maintained.

3. Performance Transparency • DDBMS performs as if it were a centralized DBMS. • Should not suffer because it is distributed (network communication cost). • When a site issues a query, the system must figure out the fastest way of executing it. • Distributed Query Processor (DQP) must figure out: • Which fragment to access. • Which copy of fragment to access (if replication is used). • Where are the fragments.

3. Performance Transparency • Consider the following distributed DB: • Property(PropertyNo, city) 10,000 records in London • Client(ClientNo, maxPrice) 100,000 records in Glasgow • Viewing(PropertNo, ClientNo) 1,000,000 records in London • London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000. SELECTp.propertyNo FROM Property P INNER JOIN (Client c INNER JOIN Viewing v ONc.clientNo = v.clientNo) ONp.propertyNo = v.propertyNo WHEREp.city = ‘Aberdeen’ AND c.maxprice > 200000;

3. Performance Transparency • After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query. • Strategies: • Move Client table to London and process query there. • Move Property and Viewing relation to Glasgow and process query there then return result. • Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with maxPrice> 200,000 then return results. • Select clients at Glasgow with salary > 200000, move them to London and join with viewing and Aberdeen property.

4. DBMS Transparency • Hides the fact that different sites have different local DBMSs. • Heterogeneous DDBMSs.

Date’s 12 Rules for a DDBMS • Local autonomy. • No reliance on a central site. • Continuous operation. • Location independence. • Fragmentation independence. • Replication independence. • Distributed query processing. • Distributed transaction processing. • Hardware independence. • Operating system independence. • Network independence. • Database independence.

Distributed DBMSs – Concepts and Design