410 likes | 467 Views
Distributed DBMSs – Concepts and Design. Chapter 22 in Textbook. Overview. Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous. Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design. Fragmentation. Replication.
E N D
Distributed DBMSs – Concepts and Design Chapter 22 in Textbook
Overview • Concepts. • What is a distributed DBMS? • Distributed Processing. • Homogeneous vs. Heterogeneous. • Functions of a DDBMS. • Components of a DDBMS. • Advantages and Disadvantages. • DDBMS Design. • Fragmentation. • Replication. • Allocation. • DDBMS Transparencies. • Date’s 12 Rules for a DDBMS.
Concepts • Centralized DBMS systems with a single logical database located at one site under the control of a single DBMS. • Distributed DBslogically interrelated collection of shared data physically distributed over a computer network. • Applications can be classified into: • Local applications. • Global applications.
Distributed DBMS • Distributed DBMS The software system that: • manages the distributed DBs. • makes distribution transparent to users. • allows users to access data on their own site as well as remote sites. • Transparent distribution is the fundamental principle of DDBMS.
Characteristics of DDBMS • A collection of logically related shared data. • The data is split into a number of fragments. • Fragments may be replicated. • Fragments/replicas are allocated to sites. • The sites are linked by a communications networks. • The data at each site is under the control of a DBMS. • The DBMS at each site can handle local applications. • Each DBMS participates in at least one global application.
Distributed DBMS Topology Site 1 Site 4 Site 2 Computer Network Site 3 Data itself is distributed and access to it can be local or remote.
Distributed Processing Site 1 Site 4 Site 2 Computer Network Site 3 Data itself is centralized but access to it can be local or remote.
Homogeneous vs. Heterogeneous DDBMS • Homogenous system: all sites use the same DBMS product. • Heterogeneous system: sites may run different DBMS products & data model. • Possible differences between data in different DBS: • Data type difference. • Value difference. • Semantic difference.
Functions of a DDBMS • Provide access to remote sites and allow transfer of queries & data among the network’s site. • Store data distribution details. • Distributed data processing. • Security control. • Concurrency control. • Recovery services.
Components of a DDBMS Global system catalog Site 1 DDBMS DC LDBMS GSC DDBMS DC Computer Network GSC DB Data communication component Site 3
Advantages of DDBMS • Reflects organizational structure. • Improve sharability & local autonomy. • Improved availability. • Improved reliability. • Improved performance.
Disadvantages of DDBMS • Complexity. • Cost. • Security. • Integrity control more difficult. • Lack of standards. • Lack of experience. • DB design more complex.
Distributed Relational DB Design • We have a group of tables and we want to distribute them between a group of sites. • Consists of 3 major steps: • Fragmentation divide a relation into a number of sub-relations (fragments). (Horizontal & vertical). • Replication make a copy of a fragment. • Allocation decide where (which site) each of the fragments and replicas are to be stored.
Distributed Relational DB Design • When we fragment, replicate and allocate, we try to achieve: • Locality of reference. • Improved reliability and availability. • Good performance. • Balanced storage capacities and costs. • Minimal communication costs.
Rules of Fragmentation • Completeness: Nothing (rows or columns) gets lost while we fragment. • Reconstruction: We can get back the original table after we fragmented it. • Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).
Types of Fragmentation Horizontal fragmentation Vertical fragmentation Mixed fragmentation
Original PropertyForRent Table Type Rent PropertyNo Street City PostCode Rooms OwnerNo StaffNo BranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007 PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
Horizontal Fragmentation Based on type of property. P1: Type=‘House’ (PropertyForRent) P2: Type=‘Flat’ (PropertyForRent) Fragment P1 Type Rent PropertyNo Street City PostCode Rooms OwnerNo StaffNo BranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 Fragment P2 PropertyNo Street City PostCode Type Rooms Rent OwnerNo StaffNo BranchNo PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005 PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
Original Staff Table StaffNo sex Salary LName BranchNo Position FName DOB SL21 1 Oct 93 Manager M John White 30000 B005 10 Nov 60 SG37 Assistant F Ann Beech 12000 B003 24 Mar 58 SG14 Supervisor David M Ford 18000 B003 3 Jun 40 SG5 Assistant F Susan Brand 24000 B007
Vertical Fragmentation S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffno,fname,lname,BranchNo(STAFF) Fragment S1 Fragment S2 StaffNo sex Salary StaffNo LName BranchNo Position FName DOB SL21 SL21 1 Oct 93 Manager M John White 30000 B005 10 Nov 60 SG37 SG37 Assistant F Ann Beech 12000 B003 24 Mar 58 SG14 SG14 Supervisor David M Ford 18000 B003 3 Jun 40 SG5 SG5 Assistant F Susan Brand 24000 B007
Mixed Fragmentation – Vertical then Horizontal Fragment S2.1 S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffoo,fname,lname,BranchNo(STAFF) StaffNo LName BranchNo FName SL21 John White B005 S2.1: BranchNo=‘B005’ (S2) S2.2: BranchNo=‘B003’ (S2) S2.3: BranchNo=‘B007’ (S2) Fragment S2.2 StaffNo LName BranchNo FName Fragment S1 SG37 Ann Beech B003 StaffNo sex Salary Position DOB SG14 David Ford B003 SL21 1 Oct 93 Manager M 30000 Fragment S2.3 10 Nov 60 SG37 Assistant F 12000 StaffNo LName BranchNo FName 24 Mar 58 SG14 Supervisor M 18000 SG5 Susan Brand B007 3 Jun 40 SG5 Assistant F 24000
Derived Horizontal Fragmentation • Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2. • It is not explicitly specified in design but implied from fragmentation of T2. • T1 (child) has a foreign key that belongs to T2 (parent). • Relationship between T1 and T2 either 1-to-1 or Many-to-1. • Use Semi-join operation:
Derived Horizontal Fragmentation • You were required by the design to horizontally fragment Staff table. • S1: BranchNo=‘B003’ (Staff) • S2: BranchNo=‘B005’ (Staff) • S3: BranchNo=‘B007’ (Staff) StaffNo sex Salary LName BranchNo Position FName DOB SL21 1 Oct 93 Manager M John White 30000 B005 10 Nov 60 SG37 Assistant F Ann Beech 12000 B003 24 Mar 58 SG14 Supervisor David M Ford 18000 B003 3 Jun 40 SG5 Assistant F Susan Brand 24000 B007
Derived Horizontal Fragmentation Fragment S1 LName BranchNo StaffNo sex Salary FName Position DOB 10 Nov 60 SG37 Ann Beech Assistant F B003 12000 24 Mar 58 SG14 David Supervisor Ford M B003 18000 Fragment S2 LName BranchNo StaffNo sex Salary FName Position DOB SL21 1 Oct 93 John White Manager M B005 30000 Fragment S3 LName BranchNo StaffNo sex Salary FName Position DOB 3 Jun 40 SG5 Susan Brand Assistant F B007 24000
Derived Horizontal Fragmentation • After we fragmented Staff, we found out that there is a table related to it, PropertyForRent. • Because Staff is now fragmented, it makes sense to fragment PropertyForRent too. Staff PropertyForRent handles 1 N S1: BranchNo=‘B003’ (Staff) S2: BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si S3: BranchNo=‘B007’ (Staff)
Original PropertyForRent Table Type Rent PropertyNo Street City PostCode Rooms OwnerNo StaffNo BranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007 PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
Derived Horizontal Fragmentation Fragment P1 Type Rent PropertyNo Street City PostCode Rooms OwnerNo StaffNo BranchNo PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003 Fragment P2 PropertyNo Street City PostCode Type Rooms Rent OwnerNo StaffNo BranchNo PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005 Fragment P3 PropertyNo Street City PostCode Type Rooms Rent OwnerNo StaffNo BranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
Transparencies in a DDBMS • 4 main transparencies: • Distribution Transparency. • Fragmnetation. • Location. • Replication. • Local Mapping. • Naming. • Transaction Transparency. • Performance Transparency. • DBMS Transparency.
1. Distribution Transparency • Allows the user to perceive the DB as a single, logical entity. • Types: • Fragmentation:the user does not need to know the data is fragmented. • Location: the user does not need to know the location of fragments. • Replication: the user does not need to know the fragments are replicated. • Local Mapping: the user specifies the fragment and its location. • Naming:DDBMS makes sure every item name is unique. • Consider the distribution of the STAFF relation: • S1: staffno,Position,sex,DOB, Salary(STAFF) • S2: staffno,fname,lname,BranchNo(STAFF) • S21: BranchNo=‘B003’ (S2) • S22: BranchNo=‘B005’ (S2) • S22: BranchNo=‘B007’ (S2)
a. Fragmentation Transparency • Highest level of distribution transparency. • The user does not need to know that the data is fragmented. • User treats DDB like a centralized DB. • The database access are based on the global schema. • Fragmentation of the data can be changed without impacting the user. • Example: SELECTFname, Lname FROM Staff WHERE position = ‘Manager’;
b. Location Transparency • The middle level of distribution transparency. • The user must know that the data is fragmented but still does not need to know the location of the data. • Data location can be changed without impact on the user. • Example: SELECT Fname, Lname FROMS21 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS22 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS23 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
c. Replication Transparency • User unaware of replication and location but knows that data is fragmented. • On the same level with location transparency.
d. Local Mapping Transparency • The lowest level of distribution transparency. • The user knows that the data is fragmented and the location of the data. Example: SELECT Fname, Lname FROMS21AT SITE 3 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS22AT SITE 5 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS23AT SITE 7 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’)
e. Naming Transparency • Each item in distributed database must have a unique name. • DDBMS must ensure that no two sites violate that. • Solutions • Create a central name server. • Bottleneck. • against local autonomy. • Prefix an object with the identifier of the site. • loss of distribution transparency.
2. Transaction Transparency • All transactions must ensure the consistency and integrity of the DDB. • Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions. • Even if transaction is split, atomicity has to be maintained.
3. Performance Transparency • DDBMS performs as if it were a centralized DBMS. • Should not suffer because it is distributed (network communication cost). • When a site issues a query, the system must figure out the fastest way of executing it. • Distributed Query Processor (DQP) must figure out: • Which fragment to access. • Which copy of fragment to access (if replication is used). • Where are the fragments.
3. Performance Transparency • Consider the following distributed DB: • Property(PropertyNo, city) 10,000 records in London • Client(ClientNo, maxPrice) 100,000 records in Glasgow • Viewing(PropertNo, ClientNo) 1,000,000 records in London • London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000. SELECT p.propertyNo FROM Property P INNER JOIN (Client c INNER JOIN Viewing v ON c.clientNo = v.clientNo) ON p.propertyNo = v.propertyNo WHERE p.city = ‘Aberdeen’ AND c.maxprice > 200000;
3. Performance Transparency • After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query. • Strategies: • Move Client table to London and process query there. • Move Property and Viewing relation to Glasgow and process query there then return result. • Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with salary > 200,000 then return results. • Select clients at Glasgow with salary > 200000, move them to London and join with viewing and Aberdeen property.
4. DBMS Transparency • Hides the fact that different sites have different local DBMSs. • Heterogeneous DDBMSs.
Date’s 12 Rules for a DDBMS • Local autonomy. • No reliance on a central site. • Continuous operation. • Location independence. • Fragmentation independence. • Replication independence. • Distributed query processing. • Distributed transaction processing. • Hardware independence. • Operating system independence. • Network independence. • Database independence.