670 likes | 844 Views
ICS 214B: Transaction Processing and Distributed Data Management. Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li. Which simple predicates should we use in Pr?. Desired property of Pr: - minimality - uniformity. Return to example:. E(#, NM, LOC, SAL,…)
E N D
ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li
Which simple predicates should we use in Pr? Desired property of Pr: - minimality - uniformity Notes 09
Return to example: E(#, NM, LOC, SAL,…) Common queries: Qa: select * Qb: select * from E from E where LOC=Sa where LOC=Sb and … and ... Notes 09
Three choices: (1) Pr = { } F1 ={ E } (2) Pr = {LOC=Sa, LOC=Sb} F2={ loc=Sa E, loc=Sb E } (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} F3={ loc=Sa sal<10 E, loc=Sa sal10 E, loc=Sb sal<10E, loc=Sb sal10 E } Notes 09
F2 is good… (not F1 , F3 ) In other words: Qa: Select … loc = Sa ... Loc=Sa sal < 10 Qb: Select … loc = Sb ... Loc=Sa sal 10 F3 F1 F2 Loc=Sb sal < 10 Loc=Sb sal 10 Notes 09
Informal definition Set of predicates Pr is uniform if: for every Fi F[Pr], every t Fi has equal probability of access by every major application. Note: F[Pr] is fragmentation defined by minterm predicates generated by Pr. Notes 09
Back to example: Qa: Select … loc = Sa ... Loc=Sa sal < 10 tuples here have higher probability of access Qb: Select … loc = Sb ... Loc=Sa sal 10 F1 Loc=Sb sal < 10 tuples here have lower probability of access Loc=Sb sal 10 so F1 is not “good”... Notes 09
Back to example: Qa: Select … loc = Sa ... Loc=Sa sal < 10 tuples here have same probability of access Qb: Select … loc = Sb ... Loc=Sa sal 10 F2 Loc=Sb sal < 10 so F2 is “good”... so is F3 ... Loc=Sb sal 10 Notes 09
Informal definition Set of predicates Pr is minimal if no Pr’ Pr is uniform Notes 09
Back to example: uniform? (1) Pr = { } N (2) Pr = {LOC=Sa, LOC=Sb} Y (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} N Pr(2) is a subset of Pr(3), so Pr(3) is not minimal... Notes 09
Is Pr uniform and minimal a good thing? Not necessarily! But it does simplify allocation problem... Notes 09
(Owner) (Member) Derived horizontal fragmentation E(ENO, NAME, SAL, LOC) J(ENO, DESCRIPTION,…) E F={ E1, E2} by LOC Common query: Given an employee name, list projects (s)he works in Notes 09
E2 E1 (at Sa) (at Sb) J Notes 09
J1 = J E1 J2 = J E2 E2 E1 (at Sa) (at Sb) J2 J1 Notes 09
Derived horizontal fragmentation R, F = { F1, F2, ... Fn} S, D = {D1, D2, …Dn} where Di =S Fi Convention: R is owner S is member F could be primary or derived Notes 09
Checking completeness and disjointness of derived fragmentation But no #= 33in E1 nor in E2! Example: Say J is: This J tuple will not be in J1 nor J2 Fragmentation not complete Notes 09
To get completeness: Need to enforce referential integrity constraint: join attr(#) of member relation join attr(#) of owner relation Notes 09
Example: E2 E1 Fragmentation is not disjoint! J J1 J2 Notes 09
To get disjointness: Join attribute(#) should be key of owner relation Notes 09
Summary: horizontal fragmentation • Type: primary, derived • Properties: completeness, disjointness • Predicates: minimal, uniform Notes 09
Vertical fragmentation Example: E E2 E1 Notes 09
R[T] R1[T1] Ti T Rn[Tn] Just like normalization of relations ... Notes 09
Properties: R[T] Ri[Ti] (1) Completeness U Ti = T all i Notes 09
Not a desirable property!! (could not reconstruct R!) (2) Disjointness Ti Tj = for all i,j ij E(#,LOC,SAL) E1(#,LOC) E2(SAL) Notes 09
(3) Reconstruction: Lossless join Ri = R all i One way to achieve lossless join: Repeat key in all fragments, i.e., Key Ti for all i Notes 09
Hybrid Fragmentation R Horizontal R2 R1 Vertical R22 R12 R21 R11 Notes 09
Hybrid Fragmentation -- Reconstruction U Horizontal Vertical R22 R12 R21 R11 Notes 09
Allocation Example: E(#,NM,LOC,SAL) F1 = loc=Sa E ; F2 = loc=Sb E Qa: select … where loc=Sa... Qb: select … where loc=Sb… Where do F1,F2 go? Site b Site a ? Notes 09
Issues • Where do queries originate? • What is communication cost? and size of answers, relations,… • What is storage capacity, cost at sites? and size of fragments? • What is processing power at sites? • What is query processing strategy? • How are joins done? • Where are answers collected? Notes 09
Do we replicate fragments? • Cost of updating copies? • Writes and concurrency control? • ... Notes 09
This is an incredibly hard problem Optimization problem: • What is best placement of fragments and/or best number of copies to: • minimize query response time • maximize throughput • minimize “some cost” • ... • Subject to constraints? • Available storage • Available bandwidth, power,… • Keep 90% of response time below X • ... • Often, can use common sense • Place fragments where they are most heavily accessed Notes 09
Summary • Horizontal and vertical fragmentation • Designing good fragmentations and allocation Next: • Query processing in distributed databases Notes 09
(1) Decomposition Algebraic query tree on relations (2) Localization Algebraic query tree on relation fragments (3) Optimization Query Query Plan Notes 09
Decomposition • Same as in centralized system • Normalization • Eliminating redundancy • Algebraic rewriting Notes 09
Normalization • Convert from query language to relational algebra Notes 09
(R.A = S.A) R.A,S.D (R.B=1 S.C=2) R S Example SELECT R.A, S.D FROM R, S WHERE (R.B=1 and S.C=2) and (R.A = S.A) Notes 09
Eliminate redundancy E.g.: in conditions: (S.A=1) (S.A>5) False (S.A<10) (S.A<5) S.A<5 Notes 09
E.g.: Common sub-expressions U U S condcondTS condT R R R Notes 09
Algebraic rewriting E.g.: Push conditions down cond3 cond cond1cond2 R S R S Notes 09
(1) Decomposition Algebraic query tree on relations (2) Localization Algebraic query tree on relation fragments Query Notes 09
Localization steps (1) Start with query tree (2) Replace relations by fragments (3) Push : up , : down (4) Simplify – eliminate unnecessary operations Notes 09
Notation for fragment [R: cond] fragment conditions its tuples satisfy Notes 09
Example A (1) E=3 R Notes 09
(2) E=3 [R1: E < 10] [R2: E 10] Notes 09
Ø (3) E=3 E=3 [R1: E < 10] [R2: E 10] Notes 09
(4) E=3 [R1: E < 10] Notes 09
Rule 1 C1[R: c2] C1[R: c1 c2] [R: False] Ø A B Notes 09
In example A: E=3[R2: E10] [E=3 R2: E=3 E10] [E=3 R2: False] Ø Notes 09
Example B (1) A=common attribute R S A Notes 09
A (2) [R1: A<5] [R2: 5 A 10] [R3: A>10] [S1: A<5] [S2: A 5] Notes 09