PMI T-6101 Advanced Database Systems

PMIT-6101Advanced Database Systems By- JesminAkhter Assistant Professor, IIT, Jahangirnagar University

Lecture 07 Distributed Database Design

Outline • Distributed Database Design • Distributed Design Issues • Data Allocation Model

A vertical fragmentation of a relation R produces fragments R1, R2,…. Rr, each of which contains a subset of R’s attributes as well as the primary key of R. The objective of vertical fragmentation is to partition a relation into a set of smaller relations so that many of the user applications will run on only one fragment. In this context, an “optimal” fragmentation is one that produces a fragmentation scheme which minimizes the execution time of user applications that run on these fragments. Vertical Fragmentation

More difficult than horizontal, because more alternatives exist. Example: In horizontal partitioning, if the total number of simple predicates in Pr is n, there are 2n possible minterm predicates that can be defined on it. some of these will contradict the existing implications, further reducing the candidate fragments that need to be considered In the case of vertical partitioning if a relation has m non-primary key attributes, the number of possible fragments is equal to B(m), which is the mth Bell number. For large values of m;B(m)= approximately (mm) for m=10, B(m) =115,000, for m=15, B(m) =109, for m=30, B(m) = 1023 Vertical Fragmentation

Vertical Fragmentation • Two types of heuristic approaches exist for the vertical fragmentation of global relations: • Grouping: starts by assigning each attribute to one fragment, and at each step, joins some of the fragments until some criteria is satisfied. • Grouping was first suggested for centralized databases [Hammer and Niamir, 1979], and was used later for distributed databases [Sacca and Wiederhold, 1985]. • Splitting: starts with a relation and decides on beneficial partitionings based on the access behavior of applications to the attributes. • The technique was also first discussed for centralized database design [Hoffer and Severance,1975]. It was then extended to the distributed environment [Navathe et al.,1984].

Hybrid Fragmentation • In most cases a simple horizontal or vertical fragmentation of a database schema will not be sufficient to satisfy the requirements of user applications. • In this case a vertical fragmentation may be followed by a horizontal one, or vice versa, producing a tree structured Partitioning. • Since the two types of partitioning strategies are applied one after the other, this alternative is called hybrid fragmentation. • It has also been named mixed fragmentation or nested fragmentation.

Hybrid Fragmentation It is also called mixed fragmentation or nested fragmentation. R HF HF R1 R2 VF VF VF VF VF R11 R12 R21 R22 R23

Correctness of Hybrid Fragmentation • To reconstruct the original global relation in case of hybrid fragmentation, one starts at the leaves of the partitioning tree and moves upward by performing joins and unions. • The fragmentation is complete if the intermediate and leaf fragments are complete. • Similarly, disjointness is guaranteed if intermediate and leaf fragments are disjoint.

Allocation • Allocation Problem Given F = {F1, F2, …, Fn} fragments S ={S1, S2, …, Sm} network sites on which a set of applications Q= {q1, q2,…, qq} is running. • The allocation problem involves finding the “optimal” distribution of F to S. • Optimality can be defined with respect to two measures: • Minimal cost • The cost function consists of the cost of storing each Fi at a site Sj, • the cost of querying Fi at site Sj , the cost of updating Fi at all sites where it is stored, • the cost of data communication. • Performance • minimize the response time. • maximize the system throughput at each site.

General Form min(Total Cost) subject to response time constraint storage constraint processing constraint Decision Variable Allocation Model 1 if fragment Fi is stored at site Sj xij 0 otherwise

Total Cost Storage Cost (of fragment Fjat Sk) We choose a different approach in our model of the database allocation problem (DAP) and specify it as consisting of the processing cost (PC) and the transmission cost (TC). Thus the query processing cost (QPC) for application qi is: processing component + transmission component Allocation Model  query processing cost  (unit storage cost at Sk) (size of Fj) xjk all queries   cost of storing a fragment at a site all sites all fragments

Allocation Model • Query Processing Cost • Processing component PC, consists of three cost factors • the access cost (AC) + the integrity enforcement cost (IE) + the concurrency control cost (CC) • Access cost • The first two terms calculate the number of accesses of user query qi to fragment Fj. • We assume that the local costs of processing them are identical. • The summation gives the total number of accesses for all the fragments referenced by qi. Multiplication by LPCk gives the cost of this access at site Sk. • We again use xijto select only those cost values for the sites where fragments are stored. • Integrity enforcement and concurrency control costs • Can be similarly calculated   xij local processing cost at a site (no. of update accesses+ no. of read accesses)  all sites all fragments

Query Processing Cost Transmission component cost of processing updates + cost of processing retrievals In update queries it is necessary to inform all the sites where replicas exist, while in retrieval queries, it is sufficient to access only one of the copies. In addition, at the end of an update request, there is no data transmission back to the originating site other than a confirmation message, whereas the retrieval-only queries may result in significant data transmission. Cost of updates Retrieval Cost Allocation Model     update message cost  acknowledgment cost all sites all fragments all sites all fragments  min (cost of retrieval command  cost of sending back the result) all sites all fragments

Allocation Model • Constraints • Response Time execution time of query ≤ max. allowable response time for that query • Storage Constraint (for a site) • Processing constraint (for a site)  storage requirement of a fragment at that site  storage capacity at that site all fragments  processing load of a query at that site  processing capacity of that site all queries

Solved Problem

Sample Questions • Lecture – 1: • Q-1: What is the distributed Database? • Q-2: What is not distributed Database? • Q-3: Problems area and Disadvantages of Distributed Database System? • Q-4: Distributed Database reality or Real view of Distributed DBMS? • Q-5: Implicit assumptions, application and Promises of Distributed Database Management system? • Q-6: Network Transparency, Replication Transparency, Data independence? • Q-7: Network Transparency • Location Transparency • Naming Transparency (Slide 32, 33)? • Q-8: Logical Data Independency, Physical Data Independency? • Q-9: Fully Transparent Access (slide 28, Lec-1)? • Q-10: Improve Performance, Reliability of distributed Database (slide 38, 39)? • Q-11: What is being distributed? • Q-12 What do you mean by distributed processing system?

Sample Questions • Lecture 2 – 3: • Normalization (all slides) • Relational Algebra (only algebra) • Lecture -4: • Reasons for Fragmentation • Slides, 6, 7, 9, 10 • Lecture -5: • Degree of Fragmentation • Slides 4, 5, 6, 7 • Comparisons of Relocation Alternatives (Slide 9) • Database Information (Slide 13) • Application Information (Slide 17, 18, 19) • Algorithm (Slides 25, 26, 27, 28, 29) • Example solving (Slides 34, 38) • Lecture -6+7: • Important aspect or desirable property of simple predicates: • Slides 7, 8, 10, 11, 12, 13, 14, 16, 22-25 • Problems solving (Slides 18, 19, 20) • Slides: 33-37(Allocation Model)

Thank You Slide 20

PMI T-6101 Advanced Database Systems

PMI T-6101 Advanced Database Systems

Presentation Transcript

INF5100 Advanced Database Systems

CS216 Advanced Database Systems

Advanced Database Systems

COP5725 Advanced Database Systems

COP5725 Advanced Database Systems

COP5725 Advanced Database Systems

COP5725 Advanced Database Systems

Advanced Database Systems

COP5725 Advanced Database Systems

Advanced Database Systems

Advanced Database Systems

Advanced Database Systems

CPS216: Advanced Database Systems

CPS216: Advanced Database Systems

Advanced Database Systems

Advanced Database Systems

CS216 Advanced Database Systems

CPS216: Advanced Database Systems

Advanced Database Systems

ระบบฐานข้อมูลขั้นสูง (Advanced Database Systems)

CPS216: Advanced Database Systems

Advanced Database Systems