1 / 40

Distributed DBMSs – Concepts and Design

Distributed DBMSs – Concepts and Design. Chapter 24 in Textbook. Overview. Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous. Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design. Fragmentation. Replication.

feng
Download Presentation

Distributed DBMSs – Concepts and Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed DBMSs – Concepts and Design Chapter 24 in Textbook

  2. Overview • Concepts. • What is a distributed DBMS? • Distributed Processing. • Homogeneous vs. Heterogeneous. • Functions of a DDBMS. • Components of a DDBMS. • Advantages and Disadvantages. • DDBMS Design. • Fragmentation. • Replication. • Allocation. • DDBMS Transparencies. • Date’s 12 Rules for a DDBMS.

  3. Concepts • Centralized DBMS systems with a single logical database located at one site under the control of a single DBMS. • Distributed DBslogically interrelated collection of shared data physically distributed over a computer network. • Applications can be classified into: • Local applications. • Global applications.

  4. Distributed DBMS • Distributed DBMS The software system that: • manages the distributed DBs. • makes distribution transparent to users. • allows users to access data on their own site as well as remote sites. • Transparent distribution is the fundamental principle of DDBMS.

  5. Characteristics of DDBMS • A collection of logically related shared data. • The data is split into a number of fragments. • Fragments may be replicated. • Fragments/replicas are allocated to sites. • The sites are linked by a communications networks. • The data at each site is under the control of a DBMS. • The DBMS at each site can handle local applications. • Each DBMS participates in at least one global application.

  6. Distributed DBMS Topology Site 1 Site 4 Site 2 Computer Network Site 3 Data itself is distributed and access to it can be local or remote.

  7. Distributed Processing Site 1 Site 4 Site 2 Computer Network Site 3 Data itself is centralized but access to it can be local or remote.

  8. Homogeneous vs. Heterogeneous DDBMS • Homogenous system: all sites use the same DBMS product. • Heterogeneous system: sites may run different DBMS products & data model. • Possible differences between data in different DBS: • Data type difference. • Value difference. • Semantic difference.

  9. Functions of a DDBMS • Provide access to remote sites and allow transfer of queries & data among the network’s site. • Store data distribution details. • Distributed data processing. • Security control. • Concurrency control. • Recovery services.

  10. Components of a DDBMS Global system catalog Site 1 DDBMS DC LDBMS GSC DDBMS DC Computer Network GSC DB Data communication component Site 3

  11. Advantages of DDBMS • Reflects organizational structure. • Improve sharability & local autonomy. • Improved availability. • Improved reliability. • Improved performance.

  12. Disadvantages of DDBMS • Complexity. • Cost. • Security. • Integrity control more difficult. • Lack of standards. • Lack of experience. • DB design more complex.

  13. Distributed Relational DB Design • We have a group of tables and we want to distribute them between a group of sites. • Consists of 3 major steps: • Fragmentation divide a relation into a number of sub-relations (fragments). (Horizontal & vertical). • Replication make a copy of a fragment. • Allocation decide where (which site) each of the fragments and replicas are to be stored.

  14. Distributed Relational DB Design • When we fragment, replicate and allocate, we try to achieve: • Locality of reference. • Improved reliability and availability. • Good performance. • Balanced storage capacities and costs. • Minimal communication costs.

  15. Rules of Fragmentation • Completeness: Nothing (rows or columns) gets lost while we fragment. • Reconstruction: We can get back the original table after we fragmented it. • Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).

  16. Types of Fragmentation Horizontal fragmentation Vertical fragmentation Mixed fragmentation

  17. Original PropertyForRent Table

  18. Horizontal Fragmentation Based on type of property. P1: Type=‘House’ (PropertyForRent) P2: Type=‘Flat’ (PropertyForRent) Fragment P1 Fragment P2 BranchNo

  19. Original Staff Table

  20. Vertical Fragmentation S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffno,fname,lname,BranchNo(STAFF) Fragment S2 Fragment S1

  21. Mixed Fragmentation – Vertical then Horizontal Fragment S2.1 S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffoo,fname,lname,BranchNo(STAFF) StaffNo LName BranchNo FName S2.1:  BranchNo=‘B005’ (S2) S2.2:  BranchNo=‘B003’ (S2) S2.3:  BranchNo=‘B007’ (S2) Fragment S2.2 StaffNo LName BranchNo Fragment S1 Fragment S2.3 LName BranchNo FName

  22. Derived Horizontal Fragmentation • Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2. • It is not explicitly specified in design but implied from fragmentation of T2. • T1 (child) has a foreign key that belongs to T2 (parent). • Relationship between T1 and T2 either 1-to-1 or Many-to-1. • Use Semi-join operation: 

  23. Derived Horizontal Fragmentation • You were required by the design to horizontally fragment Staff table. • S1:  BranchNo=‘B003’ (Staff) • S2:  BranchNo=‘B005’ (Staff) • S3:  BranchNo=‘B007’ (Staff)

  24. Derived Horizontal Fragmentation Fragment S1 Fragment S2 Fragment S3

  25. Derived Horizontal Fragmentation • After we fragmented Staff, we found out that there is a table related to it, PropertyForRent. • Because Staff is now fragmented, it makes sense to fragment PropertyForRent too. Staff PropertyForRent handles 1 N S1:  BranchNo=‘B003’ (Staff) S2:  BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si S3:  BranchNo=‘B007’ (Staff)

  26. Original PropertyForRent Table

  27. Derived Horizontal Fragmentation Fragment P1 Fragment P2 Fragment P3

  28. Transparencies in a DDBMS • 4 main transparencies: • Distribution Transparency. • Fragmnetation. • Location. • Replication. • Local Mapping. • Naming. • Transaction Transparency. • Performance Transparency. • DBMS Transparency.

  29. 1. Distribution Transparency • Allows the user to perceive the DB as a single, logical entity. • Types: • Fragmentation:the user does not need to know the data is fragmented. • Location: the user does not need to know the location of fragments. • Replication: the user does not need to know the fragments are replicated. • Local Mapping: the user specifies the fragment and its location. • Naming:DDBMS makes sure every item name is unique. • Consider the distribution of the STAFF relation: • S1: staffno,Position,sex,DOB, Salary(STAFF) • S2: staffno,fname,lname,BranchNo(STAFF) • S21:  BranchNo=‘B003’ (S2) • S22:  BranchNo=‘B005’ (S2) • S22:  BranchNo=‘B007’ (S2)

  30. a. Fragmentation Transparency • Highest level of distribution transparency. • The user does not need to know that the data is fragmented. • User treats DDB like a centralized DB. • The database access are based on the global schema. • Fragmentation of the data can be changed without impacting the user. • Example: SELECTFname, Lname FROM Staff WHERE position = ‘Manager’;

  31. b. Location Transparency • The middle level of distribution transparency. • The user must know that the data is fragmented but still does not need to know the location of the data. • Data location can be changed without impact on the user. • Example: SELECT Fname, Lname FROMS21 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS22 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS23 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

  32. c. Replication Transparency • User unaware of replication and location but knows that data is fragmented. • On the same level with location transparency.

  33. d. Local Mapping Transparency • The lowest level of distribution transparency. • The user knows that the data is fragmented and the location of the data. Example: SELECT Fname, Lname FROMS21AT SITE 3 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS22AT SITE 5 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROMS23AT SITE 7 WHERE staffNo IN (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’)

  34. e. Naming Transparency • Each item in distributed database must have a unique name. • DDBMS must ensure that no two sites violate that. • Solutions • Create a central name server. • Bottleneck. • against local autonomy. • Prefix an object with the identifier of the site. • loss of distribution transparency.

  35. 2. Transaction Transparency • All transactions must ensure the consistency and integrity of the DDB. • Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions. • Even if transaction is split, atomicity has to be maintained.

  36. 3. Performance Transparency • DDBMS performs as if it were a centralized DBMS. • Should not suffer because it is distributed (network communication cost). • When a site issues a query, the system must figure out the fastest way of executing it. • Distributed Query Processor (DQP) must figure out: • Which fragment to access. • Which copy of fragment to access (if replication is used). • Where are the fragments.

  37. 3. Performance Transparency • Consider the following distributed DB: • Property(PropertyNo, city) 10,000 records in London • Client(ClientNo, maxPrice) 100,000 records in Glasgow • Viewing(PropertNo, ClientNo) 1,000,000 records in London • London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000. SELECTp.propertyNo FROM Property P INNER JOIN (Client c INNER JOIN Viewing v ONc.clientNo = v.clientNo) ONp.propertyNo = v.propertyNo WHEREp.city = ‘Aberdeen’ AND c.maxprice > 200000;

  38. 3. Performance Transparency • After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query. • Strategies: • Move Client table to London and process query there. • Move Property and Viewing relation to Glasgow and process query there then return result. • Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with maxPrice> 200,000 then return results. • Select clients at Glasgow with salary > 200000, move them to London and join with viewing and Aberdeen property.

  39. 4. DBMS Transparency • Hides the fact that different sites have different local DBMSs. • Heterogeneous DDBMSs.

  40. Date’s 12 Rules for a DDBMS • Local autonomy. • No reliance on a central site. • Continuous operation. • Location independence. • Fragmentation independence. • Replication independence. • Distributed query processing. • Distributed transaction processing. • Hardware independence. • Operating system independence. • Network independence. • Database independence.

More Related