1 / 67

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management. Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li. Which simple predicates should we use in Pr?. Desired property of Pr: - minimality - uniformity. Return to example:. E(#, NM, LOC, SAL,…)

luigi
Download Presentation

ICS 214B: Transaction Processing and Distributed Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li

  2. Which simple predicates should we use in Pr? Desired property of Pr: - minimality - uniformity Notes 09

  3. Return to example: E(#, NM, LOC, SAL,…) Common queries: Qa: select * Qb: select * from E from E where LOC=Sa where LOC=Sb and … and ... Notes 09

  4. Three choices: (1) Pr = { } F1 ={ E } (2) Pr = {LOC=Sa, LOC=Sb} F2={  loc=Sa E,  loc=Sb E } (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} F3={ loc=Sa  sal<10 E, loc=Sa  sal10 E, loc=Sb  sal<10E, loc=Sb  sal10 E } Notes 09

  5. F2 is good… (not F1 , F3 ) In other words: Qa: Select … loc = Sa ... Loc=Sa  sal < 10 Qb: Select … loc = Sb ... Loc=Sa  sal  10 F3 F1 F2 Loc=Sb  sal < 10 Loc=Sb  sal  10 Notes 09

  6. Informal definition Set of predicates Pr is uniform if: for every Fi F[Pr], every t  Fi has equal probability of access by every major application. Note: F[Pr] is fragmentation defined by minterm predicates generated by Pr. Notes 09

  7. Back to example: Qa: Select … loc = Sa ... Loc=Sa  sal < 10 tuples here have higher probability of access Qb: Select … loc = Sb ... Loc=Sa  sal  10 F1 Loc=Sb  sal < 10 tuples here have lower probability of access Loc=Sb  sal  10 so F1 is not “good”... Notes 09

  8. Back to example: Qa: Select … loc = Sa ... Loc=Sa  sal < 10 tuples here have same probability of access Qb: Select … loc = Sb ... Loc=Sa  sal  10 F2 Loc=Sb  sal < 10 so F2 is “good”... so is F3 ... Loc=Sb  sal  10 Notes 09

  9. Informal definition Set of predicates Pr is minimal if no Pr’ Pr is uniform Notes 09

  10. Back to example: uniform? (1) Pr = { } N (2) Pr = {LOC=Sa, LOC=Sb} Y (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} N Pr(2) is a subset of Pr(3), so Pr(3) is not minimal... Notes 09

  11. Is Pr uniform and minimal a good thing? Not necessarily! But it does simplify allocation problem... Notes 09

  12. (Owner) (Member) Derived horizontal fragmentation E(ENO, NAME, SAL, LOC) J(ENO, DESCRIPTION,…) E F={ E1, E2} by LOC Common query: Given an employee name, list projects (s)he works in Notes 09

  13. E2 E1 (at Sa) (at Sb) J Notes 09

  14. J1 = J E1 J2 = J E2 E2 E1 (at Sa) (at Sb) J2 J1 Notes 09

  15. Derived horizontal fragmentation R, F = { F1, F2, ... Fn}  S, D = {D1, D2, …Dn} where Di =S Fi Convention: R is owner S is member F could be primary or derived Notes 09

  16. Checking completeness and disjointness of derived fragmentation  But no #= 33in E1 nor in E2! Example: Say J is: This J tuple will not be in J1 nor J2 Fragmentation not complete Notes 09

  17. To get completeness: Need to enforce referential integrity constraint: join attr(#) of member relation  join attr(#) of owner relation Notes 09

  18. Example: E2 E1 Fragmentation is not disjoint! J J1 J2 Notes 09

  19. To get disjointness: Join attribute(#) should be key of owner relation Notes 09

  20. Summary: horizontal fragmentation • Type: primary, derived • Properties: completeness, disjointness • Predicates: minimal, uniform Notes 09

  21. Vertical fragmentation Example: E E2 E1 Notes 09

  22. R[T]  R1[T1] Ti  T Rn[Tn] Just like normalization of relations ... Notes 09

  23. Properties: R[T]  Ri[Ti] (1) Completeness U Ti = T all i Notes 09

  24. Not a desirable property!! (could not reconstruct R!) (2) Disjointness Ti  Tj =  for all i,j ij E(#,LOC,SAL) E1(#,LOC) E2(SAL) Notes 09

  25. (3) Reconstruction: Lossless join Ri = R all i  One way to achieve lossless join: Repeat key in all fragments, i.e., Key  Ti for all i Notes 09

  26. Hybrid Fragmentation R Horizontal R2 R1 Vertical R22 R12 R21 R11 Notes 09

  27. Hybrid Fragmentation -- Reconstruction U Horizontal Vertical R22 R12 R21 R11 Notes 09

  28. Allocation Example: E(#,NM,LOC,SAL)  F1 = loc=Sa E ; F2 = loc=Sb E Qa: select … where loc=Sa... Qb: select … where loc=Sb… Where do F1,F2 go? Site b Site a ? Notes 09

  29. Issues • Where do queries originate? • What is communication cost? and size of answers, relations,… • What is storage capacity, cost at sites? and size of fragments? • What is processing power at sites? • What is query processing strategy? • How are joins done? • Where are answers collected? Notes 09

  30. Do we replicate fragments? • Cost of updating copies? • Writes and concurrency control? • ... Notes 09

  31. This is an incredibly hard problem Optimization problem: • What is best placement of fragments and/or best number of copies to: • minimize query response time • maximize throughput • minimize “some cost” • ... • Subject to constraints? • Available storage • Available bandwidth, power,… • Keep 90% of response time below X • ... • Often, can use common sense • Place fragments where they are most heavily accessed Notes 09

  32. Summary • Horizontal and vertical fragmentation • Designing good fragmentations and allocation Next: • Query processing in distributed databases Notes 09

  33. (1) Decomposition Algebraic query tree on relations (2) Localization Algebraic query tree on relation fragments (3) Optimization Query Query Plan Notes 09

  34. Decomposition • Same as in centralized system • Normalization • Eliminating redundancy • Algebraic rewriting Notes 09

  35. Normalization • Convert from query language to relational algebra Notes 09

  36. (R.A = S.A) R.A,S.D (R.B=1  S.C=2) R S Example SELECT R.A, S.D FROM R, S WHERE (R.B=1 and S.C=2) and (R.A = S.A) Notes 09

  37. Eliminate redundancy E.g.: in conditions: (S.A=1)  (S.A>5)  False (S.A<10)  (S.A<5)  S.A<5 Notes 09

  38. E.g.: Common sub-expressions U U S condcondTS condT R R R Notes 09

  39. Algebraic rewriting E.g.: Push conditions down cond3 cond cond1cond2 R S R S Notes 09

  40. (1) Decomposition Algebraic query tree on relations (2) Localization Algebraic query tree on relation fragments Query Notes 09

  41. Localization steps (1) Start with query tree (2) Replace relations by fragments (3) Push : up , : down (4) Simplify – eliminate unnecessary operations Notes 09

  42. Notation for fragment [R: cond] fragment conditions its tuples satisfy Notes 09

  43. Example A (1) E=3 R Notes 09

  44. (2) E=3  [R1: E < 10] [R2: E  10] Notes 09

  45.  Ø (3) E=3 E=3 [R1: E < 10] [R2: E  10] Notes 09

  46. (4) E=3 [R1: E < 10] Notes 09

  47. Rule 1 C1[R: c2]  C1[R: c1  c2] [R: False]  Ø A B Notes 09

  48. In example A: E=3[R2: E10]  [E=3 R2: E=3  E10]  [E=3 R2: False]  Ø Notes 09

  49. Example B (1) A=common attribute R S A Notes 09

  50. A (2)  [R1: A<5] [R2: 5 A 10] [R3: A>10] [S1: A<5] [S2: A  5] Notes 09

More Related