ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li

Which simple predicates should we use in Pr? Desired property of Pr: - minimality - uniformity Notes 09

Return to example: E(#, NM, LOC, SAL,…) Common queries: Qa: select * Qb: select * from E from E where LOC=Sa where LOC=Sb and … and ... Notes 09

Three choices: (1) Pr = { } F1 ={ E } (2) Pr = {LOC=Sa, LOC=Sb} F2={  loc=Sa E,  loc=Sb E } (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} F3={ loc=Sa  sal<10 E, loc=Sa  sal10 E, loc=Sb  sal<10E, loc=Sb  sal10 E } Notes 09

F2 is good… (not F1 , F3 ) In other words: Qa: Select … loc = Sa ... Loc=Sa  sal < 10 Qb: Select … loc = Sb ... Loc=Sa  sal  10 F3 F1 F2 Loc=Sb  sal < 10 Loc=Sb  sal  10 Notes 09

Informal definition Set of predicates Pr is uniform if: for every Fi F[Pr], every t  Fi has equal probability of access by every major application. Note: F[Pr] is fragmentation defined by minterm predicates generated by Pr. Notes 09

Back to example: Qa: Select … loc = Sa ... Loc=Sa  sal < 10 tuples here have higher probability of access Qb: Select … loc = Sb ... Loc=Sa  sal  10 F1 Loc=Sb  sal < 10 tuples here have lower probability of access Loc=Sb  sal  10 so F1 is not “good”... Notes 09

Back to example: Qa: Select … loc = Sa ... Loc=Sa  sal < 10 tuples here have same probability of access Qb: Select … loc = Sb ... Loc=Sa  sal  10 F2 Loc=Sb  sal < 10 so F2 is “good”... so is F3 ... Loc=Sb  sal  10 Notes 09

Informal definition Set of predicates Pr is minimal if no Pr’ Pr is uniform Notes 09

Back to example: uniform? (1) Pr = { } N (2) Pr = {LOC=Sa, LOC=Sb} Y (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} N Pr(2) is a subset of Pr(3), so Pr(3) is not minimal... Notes 09

Is Pr uniform and minimal a good thing? Not necessarily! But it does simplify allocation problem... Notes 09

(Owner) (Member) Derived horizontal fragmentation E(ENO, NAME, SAL, LOC) J(ENO, DESCRIPTION,…) E F={ E1, E2} by LOC Common query: Given an employee name, list projects (s)he works in Notes 09

E2 E1 (at Sa) (at Sb) J Notes 09

J1 = J E1 J2 = J E2 E2 E1 (at Sa) (at Sb) J2 J1 Notes 09

Derived horizontal fragmentation R, F = { F1, F2, ... Fn}  S, D = {D1, D2, …Dn} where Di =S Fi Convention: R is owner S is member F could be primary or derived Notes 09

Checking completeness and disjointness of derived fragmentation  But no #= 33in E1 nor in E2! Example: Say J is: This J tuple will not be in J1 nor J2 Fragmentation not complete Notes 09

To get completeness: Need to enforce referential integrity constraint: join attr(#) of member relation  join attr(#) of owner relation Notes 09

Example: E2 E1 Fragmentation is not disjoint! J J1 J2 Notes 09

To get disjointness: Join attribute(#) should be key of owner relation Notes 09

Summary: horizontal fragmentation • Type: primary, derived • Properties: completeness, disjointness • Predicates: minimal, uniform Notes 09

Vertical fragmentation Example: E E2 E1 Notes 09

R[T]  R1[T1] Ti  T Rn[Tn] Just like normalization of relations ... Notes 09

Properties: R[T]  Ri[Ti] (1) Completeness U Ti = T all i Notes 09

Not a desirable property!! (could not reconstruct R!) (2) Disjointness Ti  Tj =  for all i,j ij E(#,LOC,SAL) E1(#,LOC) E2(SAL) Notes 09

(3) Reconstruction: Lossless join Ri = R all i  One way to achieve lossless join: Repeat key in all fragments, i.e., Key  Ti for all i Notes 09

Hybrid Fragmentation R Horizontal R2 R1 Vertical R22 R12 R21 R11 Notes 09

Hybrid Fragmentation -- Reconstruction U Horizontal Vertical R22 R12 R21 R11 Notes 09

Allocation Example: E(#,NM,LOC,SAL)  F1 = loc=Sa E ; F2 = loc=Sb E Qa: select … where loc=Sa... Qb: select … where loc=Sb… Where do F1,F2 go? Site b Site a ? Notes 09

Issues • Where do queries originate? • What is communication cost? and size of answers, relations,… • What is storage capacity, cost at sites? and size of fragments? • What is processing power at sites? • What is query processing strategy? • How are joins done? • Where are answers collected? Notes 09

Do we replicate fragments? • Cost of updating copies? • Writes and concurrency control? • ... Notes 09

This is an incredibly hard problem Optimization problem: • What is best placement of fragments and/or best number of copies to: • minimize query response time • maximize throughput • minimize “some cost” • ... • Subject to constraints? • Available storage • Available bandwidth, power,… • Keep 90% of response time below X • ... • Often, can use common sense • Place fragments where they are most heavily accessed Notes 09

Summary • Horizontal and vertical fragmentation • Designing good fragmentations and allocation Next: • Query processing in distributed databases Notes 09

(1) Decomposition Algebraic query tree on relations (2) Localization Algebraic query tree on relation fragments (3) Optimization Query Query Plan Notes 09

Decomposition • Same as in centralized system • Normalization • Eliminating redundancy • Algebraic rewriting Notes 09

Normalization • Convert from query language to relational algebra Notes 09

(R.A = S.A) R.A,S.D (R.B=1  S.C=2) R S Example SELECT R.A, S.D FROM R, S WHERE (R.B=1 and S.C=2) and (R.A = S.A) Notes 09

Eliminate redundancy E.g.: in conditions: (S.A=1)  (S.A>5)  False (S.A<10)  (S.A<5)  S.A<5 Notes 09

E.g.: Common sub-expressions U U S condcondTS condT R R R Notes 09

Algebraic rewriting E.g.: Push conditions down cond3 cond cond1cond2 R S R S Notes 09

(1) Decomposition Algebraic query tree on relations (2) Localization Algebraic query tree on relation fragments Query Notes 09

Localization steps (1) Start with query tree (2) Replace relations by fragments (3) Push : up , : down (4) Simplify – eliminate unnecessary operations Notes 09

Notation for fragment [R: cond] fragment conditions its tuples satisfy Notes 09

Example A (1) E=3 R Notes 09

(2) E=3  [R1: E < 10] [R2: E  10] Notes 09

 Ø (3) E=3 E=3 [R1: E < 10] [R2: E  10] Notes 09

(4) E=3 [R1: E < 10] Notes 09

Rule 1 C1[R: c2]  C1[R: c1  c2] [R: False]  Ø A B Notes 09

In example A: E=3[R2: E10]  [E=3 R2: E=3  E10]  [E=3 R2: False]  Ø Notes 09

Example B (1) A=common attribute R S A Notes 09

A (2)  [R1: A<5] [R2: 5 A 10] [R3: A>10] [S1: A<5] [S2: A  5] Notes 09

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

Presentation Transcript

Electronic Data Processing Systems

SAP FI Accounts Payable

Order Management 訂單管理

Toward Scalable Transaction Processing

Large-scale Incremental Processing Using Distributed Transactions and Notifications

Scalable Integration and Processing of Linked Data

MULTIUSER DATABASES : Concurrency and Transaction Management

Notes 09: Transaction Processing

P roject Management

Security and Internet Commerce

Distributed Layout Management

Chapter 19: Distributed Databases

Hector Garcia-Molina

C File Processing

Distributed Transaction Management – 2003

Revenue Management and Dynamic Pricing: Part I

Energy Insurance and Risk Management The insurance transaction distribution system

The Architecture of Transaction Processing Systems

Ch 6. Transaction Processing Monitors

Database Systems: Transaction Management