1 / 40

Distributed DBMSs – Concepts and Design

Distributed DBMSs – Concepts and Design. Chapter 22 in Textbook. Overview. Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous. Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design. Fragmentation. Replication.

rhys
Download Presentation

Distributed DBMSs – Concepts and Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed DBMSs – Concepts and Design Chapter 22 in Textbook

  2. Overview • Concepts. • What is a distributed DBMS? • Distributed Processing. • Homogeneous vs. Heterogeneous. • Functions of a DDBMS. • Components of a DDBMS. • Advantages and Disadvantages. • DDBMS Design. • Fragmentation. • Replication. • Allocation. • DDBMS Transparencies. • Date’s 12 Rules for a DDBMS.

  3. Concepts • Centralized DBMS systems with a single logical database located at one site under the control of a single DBMS. • Distributed DBslogically interrelated collection of shared data physically distributed over a computer network. • Applications can be classified into: • Local applications. • Global applications.

  4. Distributed DBMS • Distributed DBMS The software system that: • manages the distributed DBs. • makes distribution transparent to users. • allows users to access data on their own site as well as remote sites. • Transparent distribution is the fundamental principle of DDBMS.

  5. Characteristics of DDBMS • A collection of logically related shared data. • The data is split into a number of fragments. • Fragments may be replicated. • Fragments/replicas are allocated to sites. • The sites are linked by a communications networks. • The data at each site is under the control of a DBMS. • The DBMS at each site can handle local applications. • Each DBMS participates in at least one global application.

  6. Distributed DBMS Topology Site 1 Site 4 Site 2 Computer Network Site 3 Data itself is distributed and access to it can be local or remote.

  7. Distributed Processing Site 1 Site 4 Site 2 Computer Network Site 3 Data itself is centralized but access to it can be local or remote.

  8. Homogeneous vs. Heterogeneous DDBMS • Homogenous system: all sites use the same DBMS product. • Heterogeneous system: sites may run different DBMS products & data model. • Possible differences between data in different DBS: • Data type difference. • Value difference. • Semantic difference.

  9. Functions of a DDBMS • Provide access to remote sites and allow transfer of queries & data among the network’s site. • Store data distribution details. • Distributed data processing. • Security control. • Concurrency control. • Recovery services.

  10. Components of a DDBMS Global system catalog Site 1 DDBMS DC LDBMS GSC DDBMS DC Computer Network GSC DB Data communication component Site 3

  11. Advantages of DDBMS • Reflects organizational structure. • Improve sharability & local autonomy. • Improved availability. • Improved reliability. • Improved performance.

  12. Disadvantages of DDBMS • Complexity. • Cost. • Security. • Integrity control more difficult. • Lack of standards. • Lack of experience. • DB design more complex.

  13. Distributed Relational DB Design • We have a group of tables and we want to distribute them between a group of sites. • Consists of 3 major steps: • Fragmentation divide a relation into a number of sub-relations (fragments). (Horizontal & vertical). • Replication make a copy of a fragment. • Allocation decide where (which site) each of the fragments and replicas are to be stored.

  14. Distributed Relational DB Design • When we fragment, replicate and allocate, we try to achieve: • Locality of reference. • Improved reliability and availability. • Good performance. • Balanced storage capacities and costs. • Minimal communication costs.

  15. Rules of Fragmentation • Completeness: Nothing (rows or columns) gets lost while we fragment. • Reconstruction: We can get back the original table after we fragmented it. • Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).

  16. Types of Fragmentation Horizontal fragmentation Vertical fragmentation Mixed fragmentation

  17. Original PropertyForRent Table Type Rent PropertyNo Street City PostCode Rooms OwnerNo StaffNo BranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007 PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005

  18. Horizontal Fragmentation Based on type of property. P1: Type=‘House’ (PropertyForRent) P2: Type=‘Flat’ (PropertyForRent) Fragment P1 Type Rent PropertyNo Street City PostCode Rooms OwnerNo StaffNo BranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 Fragment P2 PropertyNo Street City PostCode Type Rooms Rent OwnerNo StaffNo BranchNo PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005 PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003

  19. Original Staff Table StaffNo sex Salary LName BranchNo Position FName DOB SL21 1 Oct 93 Manager M John White 30000 B005 10 Nov 60 SG37 Assistant F Ann Beech 12000 B003 24 Mar 58 SG14 Supervisor David M Ford 18000 B003 3 Jun 40 SG5 Assistant F Susan Brand 24000 B007

  20. Vertical Fragmentation S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffno,fname,lname,BranchNo(STAFF) Fragment S1 Fragment S2 StaffNo sex Salary StaffNo LName BranchNo Position FName DOB SL21 SL21 1 Oct 93 Manager M John White 30000 B005 10 Nov 60 SG37 SG37 Assistant F Ann Beech 12000 B003 24 Mar 58 SG14 SG14 Supervisor David M Ford 18000 B003 3 Jun 40 SG5 SG5 Assistant F Susan Brand 24000 B007

  21. Mixed Fragmentation – Vertical then Horizontal Fragment S2.1 S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffoo,fname,lname,BranchNo(STAFF) StaffNo LName BranchNo FName SL21 John White B005 S2.1:  BranchNo=‘B005’ (S2) S2.2:  BranchNo=‘B003’ (S2) S2.3:  BranchNo=‘B007’ (S2) Fragment S2.2 StaffNo LName BranchNo FName Fragment S1 SG37 Ann Beech B003 StaffNo sex Salary Position DOB SG14 David Ford B003 SL21 1 Oct 93 Manager M 30000 Fragment S2.3 10 Nov 60 SG37 Assistant F 12000 StaffNo LName BranchNo FName 24 Mar 58 SG14 Supervisor M 18000 SG5 Susan Brand B007 3 Jun 40 SG5 Assistant F 24000

  22. Derived Horizontal Fragmentation • Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2. • It is not explicitly specified in design but implied from fragmentation of T2. • T1 (child) has a foreign key that belongs to T2 (parent). • Relationship between T1 and T2 either 1-to-1 or Many-to-1. • Use Semi-join operation: 

  23. Derived Horizontal Fragmentation • You were required by the design to horizontally fragment Staff table. • S1:  BranchNo=‘B003’ (Staff) • S2:  BranchNo=‘B005’ (Staff) • S3:  BranchNo=‘B007’ (Staff) StaffNo sex Salary LName BranchNo Position FName DOB SL21 1 Oct 93 Manager M John White 30000 B005 10 Nov 60 SG37 Assistant F Ann Beech 12000 B003 24 Mar 58 SG14 Supervisor David M Ford 18000 B003 3 Jun 40 SG5 Assistant F Susan Brand 24000 B007

  24. Derived Horizontal Fragmentation Fragment S1 LName BranchNo StaffNo sex Salary FName Position DOB 10 Nov 60 SG37 Ann Beech Assistant F B003 12000 24 Mar 58 SG14 David Supervisor Ford M B003 18000 Fragment S2 LName BranchNo StaffNo sex Salary FName Position DOB SL21 1 Oct 93 John White Manager M B005 30000 Fragment S3 LName BranchNo StaffNo sex Salary FName Position DOB 3 Jun 40 SG5 Susan Brand Assistant F B007 24000

  25. Derived Horizontal Fragmentation • After we fragmented Staff, we found out that there is a table related to it, PropertyForRent. • Because Staff is now fragmented, it makes sense to fragment PropertyForRent too. Staff PropertyForRent handles 1 N S1:  BranchNo=‘B003’ (Staff) S2:  BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si S3:  BranchNo=‘B007’ (Staff)

  26. Original PropertyForRent Table Type Rent PropertyNo Street City PostCode Rooms OwnerNo StaffNo BranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007 PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005

  27. Derived Horizontal Fragmentation Fragment P1 Type Rent PropertyNo Street City PostCode Rooms OwnerNo StaffNo BranchNo PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003 Fragment P2 PropertyNo Street City PostCode Type Rooms Rent OwnerNo StaffNo BranchNo PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005 Fragment P3 PropertyNo Street City PostCode Type Rooms Rent OwnerNo StaffNo BranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007

  28. Transparencies in a DDBMS • 4 main transparencies: • Distribution Transparency. • Fragmnetation. • Location. • Replication. • Local Mapping. • Naming. • Transaction Transparency. • Performance Transparency. • DBMS Transparency.

  29. 1. Distribution Transparency • Allows the user to perceive the DB as a single, logical entity. • Types: • Fragmentation:the user does not need to know the data is fragmented. • Location: the user does not need to know the location of fragments. • Replication: the user does not need to know the fragments are replicated. • Local Mapping: the user specifies the fragment and its location. • Naming:DDBMS makes sure every item name is unique. • Consider the distribution of the STAFF relation: • S1: staffno,Position,sex,DOB, Salary(STAFF) • S2: staffno,fname,lname,BranchNo(STAFF) • S21:  BranchNo=‘B003’ (S2) • S22:  BranchNo=‘B005’ (S2) • S22:  BranchNo=‘B007’ (S2)

  30. a. Fragmentation Transparency • Highest level of distribution transparency. • The user does not need to know that the data is fragmented. • User treats DDB like a centralized DB. • The database access are based on the global schema. • Fragmentation of the data can be changed without impacting the user. • Example: SELECTFname, Lname FROM Staff WHERE position = ‘Manager’;

  31. b. Location Transparency • The middle level of distribution transparency. • The user must know that the data is fragmented but still does not need to know the location of the data. • Data location can be changed without impact on the user. • Example: SELECT Fname, Lname FROMS21 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS22 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS23 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

  32. c. Replication Transparency • User unaware of replication and location but knows that data is fragmented. • On the same level with location transparency.

  33. d. Local Mapping Transparency • The lowest level of distribution transparency. • The user knows that the data is fragmented and the location of the data. Example: SELECT Fname, Lname FROMS21AT SITE 3 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS22AT SITE 5 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS23AT SITE 7 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’)

  34. e. Naming Transparency • Each item in distributed database must have a unique name. • DDBMS must ensure that no two sites violate that. • Solutions • Create a central name server. • Bottleneck. • against local autonomy. • Prefix an object with the identifier of the site. • loss of distribution transparency.

  35. 2. Transaction Transparency • All transactions must ensure the consistency and integrity of the DDB. • Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions. • Even if transaction is split, atomicity has to be maintained.

  36. 3. Performance Transparency • DDBMS performs as if it were a centralized DBMS. • Should not suffer because it is distributed (network communication cost). • When a site issues a query, the system must figure out the fastest way of executing it. • Distributed Query Processor (DQP) must figure out: • Which fragment to access. • Which copy of fragment to access (if replication is used). • Where are the fragments.

  37. 3. Performance Transparency • Consider the following distributed DB: • Property(PropertyNo, city) 10,000 records in London • Client(ClientNo, maxPrice) 100,000 records in Glasgow • Viewing(PropertNo, ClientNo) 1,000,000 records in London • London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000. SELECT p.propertyNo FROM Property P INNER JOIN (Client c INNER JOIN Viewing v ON c.clientNo = v.clientNo) ON p.propertyNo = v.propertyNo WHERE p.city = ‘Aberdeen’ AND c.maxprice > 200000;

  38. 3. Performance Transparency • After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query. • Strategies: • Move Client table to London and process query there. • Move Property and Viewing relation to Glasgow and process query there then return result. • Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with salary > 200,000 then return results. • Select clients at Glasgow with salary > 200000, move them to London and join with viewing and Aberdeen property.

  39. 4. DBMS Transparency • Hides the fact that different sites have different local DBMSs. • Heterogeneous DDBMSs.

  40. Date’s 12 Rules for a DDBMS • Local autonomy. • No reliance on a central site. • Continuous operation. • Location independence. • Fragmentation independence. • Replication independence. • Distributed query processing. • Distributed transaction processing. • Hardware independence. • Operating system independence. • Network independence. • Database independence.

More Related