810 likes | 989 Views
CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design. Hector Garcia-Molina. Chapter 5 Ozsu & Valduriez. Distributed DB Design. Top-down approach: - have DB… - how to split and allocate the sites
E N D
CS 347: Parallel and DistributedData ManagementNotes02: Distributed DB Design Hector Garcia-Molina Notes 02
Chapter 5 Ozsu & Valduriez Distributed DB Design Top-down approach: - have DB… - how to split and allocate the sites Multi-DBs (or bottom-up): no design issues! Notes 02
Two issues in DDB design: • Fragmentation • Allocation Note: issues not independent, but will cover separately Notes 02
Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sb and… and ... Notes 02
Motivation: Two sites: Sa, Sb Qa Qb Sa Sb Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sb and… and ... Notes 02
It does not take a rocket scientist to figure out fragmentation... Notes 02
5 Joe Sa 10 # NM Loc Sal E 7 Sally Sb 25 8 Tom Sa 15 .. .. F # NM Loc Sal # NM Loc Sal 5 Joe Sa 10 7 Sally Sb 25 .. 8 Tom Sa 15 .. At Sb At Sa Notes 02
F = { F1, F2 } F1 = loc=SaE F2 = loc=SbE Notes 02
F = { F1, F2 } F1 = loc=SaE F2 = loc=SbE called primary horizontal fragmentation Notes 02
Fragmentation • Horizontal Primary depends on local attributes R Derived depends on foreign relation • Vertical R Notes 02
Fragmentation • Horizontal Primary depends on local attributes R Derived depends on foreign relation • Vertical R Fragmentation also called Sharding Notes 02
Three common horizontal partitioning techniques • Round robin • Hash partitioning • Range partitioning Notes 02
Round robin R D0 D1 D2 t1 t1 t2 t2 t3 t3 t4 t4 ... t5 • Evenly distributes data • Good for scanning full relation • Not good for point or range queries Notes 02
Hash partitioning R D0 D1 D2 t1h(k1)=2 t1 t2h(k2)=0 t2 t3h(k3)=0 t3 t4h(k4)=1 t4 ... • Good for point queries on key; also for joins • Not good for range queries; point queries not on key • If hash function good, even distribution Notes 02
Range partitioning R D0 D1 D2 t1: A=5 t1 t2: A=8 t2 t3: A=2 t3 t4: A=3 t4 ... • Good for some range queries on A • Need to select good vector: else unbalance data skew execution skew partitioning vector 4 7 V0 V1 Notes 02
Which are good fragmentations? Example: F = { F1, F2 } F1 = sal<10 E F2 = sal>20 E Notes 02
Which are good fragmentations? Example: F = { F1, F2 } F1 = sal<10 E F2 = sal>20 E Problem: Some tuples lost! Notes 02
Which are good fragmentations? Second example: F = { F3, F4 } F3 = sal<10 E F4 = sal>5 E Notes 02
Which are good fragmentations? Second example: F = { F3, F4 } F3 = sal<10 E F4 = sal>5 E Tuples with 5 < sal < 10 are duplicated... Notes 02
Prefer to deal with replication explicitly Example: F = { F5, F6, F7 } F5 = sal 5 E F6 = 5< sal <10 EF7 = sal 10 E Then replicate F6 if convenient (part of allocation problem) Notes 02
Desired properties for horizontal fragmentation R F ={ F1, F2, … } (1) Completeness t R, Fi Fsuch that t Fi Notes 02
(2) Disjointness t Fi, Fj such that tFj, i j, Fi, FjF (3) Reconstruction - ignore Notes 02
How do we get completeness and disjointness? (1) Check it “manually”! e.g., F1 = sal<10 E ; F2 = sal10 E Notes 02
How do we get completeness and disjointness? (2) “Automatically” generate fragments with these properties Desired simple predicates Fragments Notes 02
Example of generation • Say queries use predicates: A<10, A>5, Loc = SA, Loc = SB • Next: - generate “minterm” predicates - eliminate useless ones Notes 02
Minterm predicates (part I) (1) A<10 A>5 Loc=SA Loc=SB (2) A<10 A>5 Loc=SA ¬(Loc=SB) (3) A<10 A>5 ¬(Loc=SA) Loc=SB (4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB) (5) A<10 ¬(A>5) Loc=SA Loc=SB (6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB) (7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB (8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) Notes 02
Minterm predicates (part I) (1) A<10 A>5 Loc=SA Loc=SB (2) A<10 A>5 Loc=SA ¬(Loc=SB) (3) A<10 A>5 ¬(Loc=SA) Loc=SB (4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB) (5) A<10 ¬(A>5) Loc=SA Loc=SB (6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB) (7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB (8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) Notes 02
5 < A < 10 A 5 Minterm predicates (part I) (1) A<10 A>5 Loc=SA Loc=SB (2) A<10 A>5 Loc=SA ¬(Loc=SB) (3) A<10 A>5 ¬(Loc=SA) Loc=SB (4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB) (5) A<10 ¬(A>5) Loc=SA Loc=SB (6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB) (7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB (8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) Notes 02
Minterm predicates (part II) (9) ¬(A<10) A>5 Loc=SA Loc=SB (10) ¬(A<10) A>5 Loc=SA¬(Loc=SB) (11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB (12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB) (13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB (14) ¬(A<10) ¬(A>5) Loc=SA¬(Loc=SB) (15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB (16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) Notes 02
A 10 Minterm predicates (part II) (9) ¬(A<10) A>5 Loc=SA Loc=SB (10) ¬(A<10) A>5 Loc=SA¬(Loc=SB) (11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB (12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB) (13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB (14) ¬(A<10) ¬(A>5) Loc=SA¬(Loc=SB) (15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB (16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) Notes 02
Final fragments: F2: 5 < A < 10 Loc=SA F3: 5 < A < 10 Loc=SB F6: A 5 Loc=SA F7: A 5 Loc=SB F10: A 10 Loc=SA F11: A 10 Loc=SB Notes 02
Note: elimination of useless fragments depends on application semantics: e.g.: if LOC could be SA, SB, we need to add fragments F4: 5 <A <10 Loc SA Loc SB F8: A 5 Loc SA Loc SB F12: A 10 Loc SA Loc SB Notes 02
Why does this work? Predicates: p1 p2 p3 p4 p1 p2 p3 ¬p4 ¬ p1 ¬p2 ¬p3 ¬p4 ... Notes 02
(1) Completeness: Take t R pi(t) must be T or F! Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F Then t is in fragment with predicate p1 p2 ¬p3 ¬p4 Notes 02
(2) Disjointness Say t Fragment p1 p2 ¬p3 ¬p4 Then: p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F t cannot be in any other fragment! Notes 02
Summary • Given simple predicates Pr= { p1, p2,.. pm } minterm predicates are M={m | m = pk*, 1 k m } where pk* is pk or is ¬ pk pkPr • Fragments m R for all m M are • complete and disjoint Notes 02
Another Desired Fragmentation Property:Match Access Patterns data A try to place in same fragment frequently accessed together data B data C Notes 02
Return to example: E(#, NM, LOC, SAL,…) Common queries: Qa: select * Qb: select * from E from E where LOC=Sa where LOC=Sb and … and ... Notes 02
Three choices: (1) Pr = { } F1 ={ E } (2) Pr = {LOC=Sa, LOC=Sb} F2={ loc=Sa E, loc=Sb E } (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} F3={ loc=Sa sal<10 E, loc=Sa sal10 E, loc=Sb sal<10E, loc=Sb sal10 E } Notes 02
In other words: Qa: Select … loc = Sa ... Loc=Sa sal < 10 Qb: Select … loc = Sb ... Loc=Sa sal 10 F3 F1 F2 Loc=Sb sal < 10 Loc=Sb sal 10 Notes 02
F2 is good… (not F1 , F3 ) In other words: Qa: Select … loc = Sa ... Loc=Sa sal < 10 Qb: Select … loc = Sb ... Loc=Sa sal 10 F3 F1 F2 Loc=Sb sal < 10 Loc=Sb sal 10 Notes 02
Derived horizontal fragmentation Example: E(#, NM, SAL, LOC)F={ E1, E2} by LOC J(#, DES,…) Common query for project: [Given employee name, list projects (s)he works in] Notes 02
E2 E1 (at Sa) (at Sb) J Notes 02
J1 = J E1 J2 = J E2 E2 E1 (at Sa) (at Sb) J2 J1 Notes 02
Derived horizontal fragmentation F could be primary or derived R, F = { F1, F2, ... Fn} S, D = {D1, D2, …Dn} where Di =S Fi Convention: R is owner S is member Notes 02
Checking completeness and disjointness of derived fragmentation Example: Say J is: But no #= 33in E1 nor in E2! This J tuple will not be in J1 nor J2 Fragmentation not complete Notes 02
To get completeness Need to enforce referential integrity constraint: join attr(#) of member relation joint attr(#) of owner relation Notes 02
Example: E2 E1 Fragmentation is not disjoint! J J1 J2 Notes 02
To get disjointness Join attribute(#) should be key of owner relation Notes 02
Summary: horizontal fragmentation • Type: primary, derived • Properties: completeness, disjointness Notes 02