530 likes | 728 Views
PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture 06 Distributed Database Design. Outline. Distributed Database Design Distributed Design Problem Distributed Design Issues Fragmentation Data Allocation.
E N D
PMIT-6102Advanced Database Systems By- JesminAkhter Assistant Professor, IIT, Jahangirnagar University
Outline • Distributed Database Design • Distributed Design Problem • Distributed Design Issues • Fragmentation • Data Allocation
Distributed Design Problem • The design of a distributed computer system involves • making decisions on the placement of data and programs across the sites of a computer network • as well as possibly designing the network itself • In the case of distributed DBMSs, the distribution of applications involves two things: • distribution of the distributed DBMS software • distribution of the application programs that run on it. Are not significant problem • Assume a copy of the distributed DBMS software exists at each site where data are stored • Network has already been designed • We concentrate on distribution of data
The following set of interrelated questions covers the entire issue. Why fragment at all? How should we fragment? How much should we fragment? How to test correctness? How should we allocate? What is the necessary information for fragmentation and allocation? Distribution Design Issues
Reasons for Fragmentation • The important issue is the appropriate unit of distribution. • A relation is not a suitable unit, for a number of reasons. • First, application views are usually subsets of relations. • Therefore, the locality of accesses of applications is defined not on entire relations but on their subsets • Hence consider subsets of relations as distribution units.
Reasons for Fragmentation • The relation is not replicated and is stored at only one site, • results in an unnecessarily high volume of remote data accesses • The relation is replicated at all or some of the sites where the applications reside. • May has unnecessary replication, which causes problems in executing updates • may not be desirable if storage is limited. • Finally, the decomposition of a relation into fragments, each being treated as a unit, permits a number of transactions to execute concurrently. • Thus fragmentation typically increases the level of concurrency and therefore the system throughput.
Fragmentation Alternatives • Relation instances are essentially tables, so the issue is one of finding alternative ways of dividing a table into smaller ones. • There are clearly two alternatives for this: • dividing it horizontally or dividing it vertically.
PROJ1 : projects with budgets less than $200,000 PROJ2 : projects with budgets greater than or equal to $200,000 P1 Instrumentation 150000 Montreal P2 Database Develop. 135000 New York Fragmentation Alternatives – Horizontal PROJ PNO PNAME BUDGET LOC P1 Instrumentation 150000 Montreal P2 Database Develop. 135000 New York P3 CAD/CAM 250000 New York P4 Maintenance 310000 Paris P5 CAD/CAM 500000 Boston PROJ1 PROJ2 LOC LOC PNO PNAME BUDGET PNO PNAME BUDGET P3 CAD/CAM 250000 New York P4 Maintenance 310000 Paris P5 CAD/CAM 500000 Boston Example of Horizontal Partitioning
PROJ1: information about project budgets PROJ2: information about project names and locations Fragmentation Alternatives – Vertical PROJ PNO PNAME BUDGET LOC P1 Instrumentation 150000 Montreal P2 Database Develop. 135000 New York P3 CAD/CAM 250000 New York P4 Maintenance 310000 Paris P5 CAD/CAM 500000 Boston PROJ1 PROJ2 PNO PNAME LOC PNO BUDGET P1 Instrumentation Montreal P1 150000 P2 135000 P2 Database Develop. New York P3 CAD/CAM New York P3 250000 P4 310000 P4 Maintenance Paris P5 CAD/CAM Boston P5 500000 Example of Vertical Partitioning
Correctness of Fragmentation • Completeness • Decomposition of relation R into fragments R1, R2, ..., Rn is complete if and only if each data item in R can also be found in some Ri • This property, which is identical to the lossless decomposition property of normalization • it ensures that the data in a global relation are mapped into fragments without any loss
Correctness of Fragmentation • Reconstruction • If relation R is decomposed into fragments FR={R1, R2, ..., Rn}, then there should exist some relational operator ∇such that R = ∇ Ri , • The reconstructability of the relation from its fragments ensures that constraints defined on the data in the form of dependencies are preserved.
Correctness of Fragmentation • Disjointness • If relation R is horizontally decomposed into fragments FR={R1, R2, ..., Rn}, and data item djis in Rj, then djshould not be in any other fragment Rk (k ≠ j ). • This criterion ensures that the horizontal fragments are disjoint. • If relation R is vertically decomposed, its primary key attributes are typically repeated in all its fragments (for reconstruction). • Therefore, in case of vertical partitioning, disjointness is defined only on the non-primary key attributes of a relation.
Allocation Alternatives • Non-replicated • partitioned : each fragment resides at only one site • Replicated • fully replicated : each fragment at each site • partially replicated : each fragment at some of the sites
Information Requirements • The information needed for distribution design can be divided into four categories: • Database information • Application information • Communication network information • Computer system information
Fragmentation • Horizontal Fragmentation (HF) • Vertical Fragmentation (VF) • Hybrid Fragmentation (HF)
Horizontal Fragmentation (HF) • Horizontal fragmentation partitions a relation along its tuples. • each fragment has a subset of the tuples of the relation. • There are two versions of horizontal partitioning: • Primary horizontal fragmentation • Derived horizontal fragmentation • Primary horizontal fragmentation of a relation is performed • using predicates that are defined on that relation. • Derived horizontal fragmentation is the partitioning of a relation • results from predicates being defined on another relation.
Database Information how the database relations are connected to one another, especially with joins. In the relational model, these relationships are also depicted as relations. cardinality of each relation: card(R) Information Requirements of HF SKILL TITLE, SAL L1 PROJ EMP ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC L2 L3 ASG ENO, PNO, RESP, DUR Expression of Relationships Among Relations Using Links
Information Requirements of HF • Application Information • simple predicates: Given R[A1, A2, …, An], a simple predicate pj is pj : AiθValue where θ {=,<,≤,>,≥,≠}, Value Diand Di is the domain of Ai. For relation R we define Pr= {p1, p2, …,pm} Example : PNAME = "Maintenance" BUDGET ≤ 200000
Information Requirements of HF • Application Information • mintermpredicates: Given R and Pr = {p1, p2, …,pm} define M = {m1,m2,…,mr} as M = { mi | mi= pjPrpj* }, 1≤j≤m, 1≤i≤z wherepj* = pj orpj* = ¬(pj). Example m1: PNAME="Maintenance" BUDGET≤200000 m2: NOT(PNAME="Maintenance") BUDGET≤200000 m3: PNAME= "Maintenance" NOT(BUDGET≤200000) m4: NOT(PNAME="Maintenance")NOT(BUDGET≤200000)
Information Requirements of HF • Application Information • In terms of quantitative information about user applications, we need to have two sets of data. • mintermselectivities: sel(mi) • The number of tuples of the relation that would be accessed by a user query which is specified according to a given minterm predicate mi. • access frequencies: acc(qi) • The frequency with which a user application qi accesses data. • Access frequency for a minterm predicate can also be defined.
Primary Horizontal Fragmentation • Definition: A primary horizontal fragmentation is defined by a selection operation on the owner relations of a database schema. Therefore, given relation R, its horizontal fragments are given by Ri= Fi(R), 1 ≤ i≤ w • where Fiis a selection formula used to obtain fragment Ri. If Fi is in conjunctive normal form, it is a minterm predicate (mi). • A horizontal fragment Riof relation R consists of all the tuples of R which satisfy a minterm predicate mi. • Given a set of minterm predicates M, there are as many horizontal fragments of relation R as there are minterm predicates. • Set of horizontal fragments also referred to as minterm fragments.
Primary Horizontal Fragmentation • PROJ1 = LOC=“Montreal” (PROJ) • PROJ2 = LOC=“New York” (PROJ) • PROJ3 = LOC=“Paris” (PROJ) Primary Horizontal Fragmentation of Relation PROJ
PHF – Algorithm Given: A relation R, the set of simple predicates Pr Output: The set of fragments of R = {R1,R2,…,Rw} which obey the fragmentation rules. Preliminaries : • Pr should be complete • Pr should be minimal
Completeness of Simple Predicates • A set of simple predicates Pr is said to be complete if and only if there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to Pr. • Example : • Assume PROJ[PNO,PNAME,BUDGET,LOC] has two applications defined on it. • Find the budgets of projects at each location. (1) • Find projects with budgets less than $200000. (2)
Completeness of Simple Predicates According to (1), Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”} which is not complete with respect to (2). Modify Pr ={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”, BUDGET≤200000,BUDGET>200000} which is complete.
Minimality of Simple Predicates • If a predicate influences how fragmentation is performed, (i.e., causes a fragment f to be further fragmented into, say, fiandfj) then there should be at least one application that accesses fi andfj differently. • In other words, the simple predicate should be relevant in determining a fragmentation. • If all the predicates of a set Pr are relevant, then Pr is minimal. • Pi is relevant if and only if • acc(mi) the access frequency of a mintermmi.
Minimality of Simple Predicates Example : Pr ={LOC=“Montreal”,LOC=“New York”, LOC=“Paris”, BUDGET≤200000,BUDGET>200000} is minimal (in addition to being complete). However, if we add PNAME = “Instrumentation” then Pr is not minimal.
COM_MIN Algorithm Given: a relation R and a set of simple predicates Pr Output: a complete and minimal set of simple predicates Pr' for Pr Rule 1: a relation or fragment is partitioned into at least two parts which are accessed differently by at least one application.
COM_MIN Algorithm • Initialization : • find a pi Prsuch that pi partitions R according to Rule 1 • set Pr' = pi ; PrPr– {pi}; F {fi} • Iteratively add predicates to Pr' until it is complete • find a pj Prsuch that pj partitions some fk defined according to minterm predicate over Pr' according to Rule 1 • set Pr' = Pr' {pj}; Pr Pr– {pj}; F F {fj} • if pk Pr' which is nonrelevant then Pr'Pr' – {pk} FF – {fk}
PHORIZONTAL Algorithm Makes use of COM_MIN to perform fragmentation. Input: a relation R and a set of simple predicates Pr Output: a set of minterm predicates M according to which relation R is to be fragmented • Pr' COM_MIN (R,Pr) • determine the set M of mintermpredicates • determine the set I of implications among piPr‘ • Iteratively eliminate the contradictory minterms from M • M M-mi
PHF – Example • Two candidate relations : PAY and PROJ. • Fragmentation of relation PAY • Application: Check the salary info and determine raise. • Employee records kept at two sites application run at two sites • Simple predicates p1 : SAL ≤ 30000 p2 : SAL > 30000 Pr = {p1,p2} which is complete and minimal Pr'=Pr • Minterm predicates m1 : (SAL ≤ 30000) m2 : NOT(SAL ≤ 30000) = (SAL > 30000)
SAL TITLE Mech. Eng. 27000 Programmer 24000 PHF – Example PAY1 PAY2 SAL TITLE Elect. Eng. 40000 Syst. Anal. 34000
PHF – Example • Fragmentation of relation PROJ • Applications: • Find the name and budget of projects given their location. • Issued at three sites • Access project information according to budget • one site accesses ≤200000 other accesses >200000 • Simple predicates • For application (1) p1 : LOC = “Montreal” p2 : LOC = “New York” p3 : LOC = “Paris” • For application (2) p4 : BUDGET ≤ 200000 p5 : BUDGET > 200000 • Pr = Pr' = {p1,p2,p3,p4,p5}
Fragmentation of relation PROJ continued Minterm fragments left after elimination m1 : (LOC = “Montreal”) (BUDGET ≤ 200000) m2 : (LOC = “Montreal”) (BUDGET > 200000) m3 : (LOC = “New York”) (BUDGET ≤ 200000) m4 : (LOC = “New York”) (BUDGET > 200000) m5 : (LOC = “Paris”) (BUDGET ≤ 200000) m6 : (LOC = “Paris”) (BUDGET > 200000) PHF – Example
PNO PNAME BUDGET LOC PHF – Example The result of the primary horizontal fragmentation of PROJ forms six fragments FPROJ = {PROJ1, PROJ2, PROJ3, PROJ4, PROJ5, PROJ} according to the minterm predicates M . Since fragments PROJ2, and PROJ5 are empty, they are not depicted in Figure PROJ3 PROJ1 PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC New York P2 Database Develop. P1 Instrumentation 150000 Montreal 135000 PROJ4 PROJ6 P3 CAD/CAM 250000 New York P4 Maintenance 310000 Paris PNO PNAME BUDGET LOC
Completeness Since Pr' is complete and minimal, the selection predicates are complete Reconstruction If relation R is fragmented into FR= {R1,R2,…,Rr} R = RiFRRi Disjointness Minterm predicates that form the basis of fragmentation should be mutually exclusive. PHF – Correctness
Derived Horizontal Fragmentation • Defined on a member relation of a link according to a selection operation specified on its owner. • Each link is an equijoin. • Equijoin can be implemented by means of semijoins. SKILL TITLE, SAL L1 EMP PROJ ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC L2 L3 ASG ENO, PNO, RESP, DUR
DHF – Definition Given a link L where owner(L)=S and member(L)=R, the derived horizontal fragments of R are defined as Ri = R ⋉Si, 1≤i≤w where w is the maximum number of fragments that will be defined on R and Si= Fi(S) where Fi is the formula according to which the primary horizontal fragment Si is defined.
Given link L1 where owner(L1)=SKILL and member(L1)=EMP EMP1 = EMP ⋉SKILL1 EMP2 = EMP ⋉SKILL2 where SKILL1 = SAL≤30000(SKILL) SKILL2 = SAL>30000(SKILL) EMP1 ENO ENAME TITLE E3 A. Lee Mech. Eng. E2 M. Smith Syst. Anal. E4 J. Miller Programmer E5 B. Casey Syst. Anal. E7 R. Davis Mech. Eng. E6 L. Chu Elect. Eng. E8 J. Jones Syst. Anal. DHF – Example EMP2 ENO ENAME TITLE E1 J. Doe Elect. Eng.
DHF – Correctness • Completeness • Referential integrity • ensures that the tuples of any fragment of the member relation are also in the owner relation. • Let R be the member relation of a link whose owner is relation S which is fragmented as FS= {S1, S2, ..., Sn}. Furthermore, let A be the join attribute between R and S. Then, for each tuplet of R, there should be a tuplet' of S such that t[A] = t' [A] • Reconstruction • Let a relation R with fragmentation FR = { R1,R2,……Rw } • Disjointness • Simple join graphs between the owner and the member fragments.
More difficult than horizontal, because more alternatives exist. Example: In horizontal partitioning, if the total number of simple predicates in Pr is n, there are 2n possible minterm predicates that can be defined on it. some of these will contradict the existing implications, further reducing the candidate fragments that need to be considered In the case of vertical partitioning if a relation has m non-primary key attributes, the number of possible fragments is equal to B(m), which is the mth Bell number. For large values of m;B(m)= approximately (mm) for m=10, B(m) =115,000, for m=15, B(m) =109, for m=30, B(m) = 1023 Vertical Fragmentation
Hybrid Fragmentation • In most cases a simple horizontal or vertical fragmentation of a database schema will not be sufficient to satisfy the requirements of user applications. • In this case a vertical fragmentation may be followed by a horizontal one, or vice versa, producing a tree structured Partitioning. • Since the two types of partitioning strategies are applied one after the other, this alternative is called hybrid fragmentation. • It has also been named mixed fragmentation or nested fragmentation.
Hybrid Fragmentation It is also called mixed fragmentation or nested fragmentation. R HF HF R1 R2 VF VF VF VF VF R11 R12 R21 R22 R23
Correctness of Hybrid Fragmentation • To reconstruct the original global relation in case of hybrid fragmentation, one starts at the leaves of the partitioning tree and moves upward by performing joins and unions. • The fragmentation is complete if the intermediate and leaf fragments are complete. • Similarly, disjointness is guaranteed if intermediate and leaf fragments are disjoint.
Allocation • Allocation Problem Given F = {F1, F2, …, Fn} fragments S ={S1, S2, …, Sm} network sites on which a set of applications Q= {q1, q2,…, qq} is running. • The allocation problem involves finding the “optimal” distribution of F to S. • Optimality can be defined with respect to two measures: • Minimal cost • The cost function consists of the cost of storing each Fi at a site Sj, • the cost of querying Fi at site Sj , the cost of updating Fi at all sites where it is stored, • the cost of data communication. • Performance • minimize the response time. • maximize the system throughput at each site.
General Form min(Total Cost) subject to response time constraint storage constraint processing constraint Decision Variable Allocation Model 1 if fragment Fi is stored at site Sj xij 0 otherwise
Total Cost Storage Cost (of fragment Fjat Sk) We choose a different approach in our model of the database allocation problem (DAP) and specify it as consisting of the processing cost (PC) and the transmission cost (TC). Thus the query processing cost (QPC) for application qi is: processing component + transmission component Allocation Model query processing cost (unit storage cost at Sk) (size of Fj) xjk all queries cost of storing a fragment at a site all sites all fragments
Allocation Model • Query Processing Cost • Processing component PC, consists of three cost factors • the access cost (AC) + the integrity enforcement cost (IE) + the concurrency control cost (CC) • Access cost • The first two terms calculate the number of accesses of user query qi to fragment Fj. • We assume that the local costs of processing them are identical. • The summation gives the total number of accesses for all the fragments referenced by qi. Multiplication by LPCk gives the cost of this access at site Sk. • We again use xijto select only those cost values for the sites where fragments are stored. • Integrity enforcement and concurrency control costs • Can be similarly calculated xij local processing cost at a site (no. of update accesses+ no. of read accesses) all sites all fragments