290 likes | 392 Views
CS4432: Database Systems II. Logical Plan Rewriting. SQL query. parse. parse tree. convert. answer. logical query plan. execute. apply laws. statistics. Pi. “improved” l.q.p. pick best. estimate result sizes. {(P1,C1),(P2,C2)...}. l.q.p. +sizes. estimate costs.
E N D
CS4432: Database Systems II Logical Plan Rewriting
SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..} query processing
Query in SQL Query Plan in Algebra (logical) Other Query Plan in Algebra (logical)
Query plan 1 (in relational algebra) B,D sR.A=“c” S.E=2 R.C=S.C X R S
Query plan 2 (in relational algebra) B,D sR.A = “c”sS.E = 2 R S natural join on R.C=S.C
Relational algebra optimization • What are transformation rules ? • preserve equivalence • What are good transformations? • reduce query execution costs
Can also write as trees, e.g.: T R R S S T Rules:Natural join rewriting. R S = S R (R S) T = R (S T)
Rules: Other binary operators ? R S = S R (R S) T = R (S T) • What about : • Cross product? • Condition join? • Union? • Intersection ? • Difference ?
Rules: Selects sp1p2(R)= sp1 [ sp2 (R)] [ sp1 (R)] U [ sp2 (R)] sp1vp2(R) =
Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} What about union R U S = ? • Option 1 SUM • R U S = {a,a,b,b,b,b,b,c,c,c,d} • Option 2 MAX • R U S = {a,a,b,b,b,c,c,d}
Which option makes this rule work ?sp1vp2 (R) = sp1(R) U sp2(R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c • sp1vp2 (R) = {a,a,b,b,b,c}sp1(R) = {a,a,b,b,b}sp2(R) = {b,b,b,c}sp1(R) U sp2 (R) = {a,a,b,b,b,c} Let us try MAX():
Which option makes this rule work ?sp1vp2 (R) = sp1(R) U sp2(R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c • sp1vp2 (R) = {a,a,b,b,b,c}sp1(R) = {a,a,b,b,b}sp2(R) = {b,b,b,c}sp1(R) U sp2 (R) = {a,a,b,b,b,b,b,b,c} What about Sum()?
Which option makes this rule work ?sp1p2(R)= sp1 [ sp2 (R)] Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c What about MAX versus SUM ?
Yet another example ! Senators (……) Reps (……) T1 = pyr,state Senators; T2 = pyr,state Reps T1 Yr State T2 Yr State 97 CA 99 CA 99 CA 99 CA 98 AZ 98 CA Union? “Sum” option makes more sense!
Executive Decision -> Use “SUM” option for bag unions -> CAREFUL ! Some rules cannot be used for bags
Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y pxy (R) = px [py (R)]
[sp (R)] S R [sq (S)] Rules:s + combined Let p = predicate with only R attributes q = predicate with only S attributes m = predicate with both R and S attribs sp (R S) = sq (R S) =
Rules:s + combined spq (R S) = ? Rule can be derived !
Derivation for rule : spq (R S) = sp [sq (R S) ] = sp[ R sq (S) ] = [sp (R)] [sq (S)]
Rules:s + combined (continued) More Rules can be Derived: spq (R S) = spqm (R S) = spvq (R S) =
We did one, do others on your own : spq (R S) = [sp (R)] [sq (S)] spqm (R S) = sm[(sp R) (sq S)] spvq (R S) = [(sp R) S] U [R(sq S)]
pxz px Rules:p,s combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) px[sp (R) ] = {sp [ px (R) ]}
pxy{[pxz (R) ][pyz (S) ]} Rules:p, combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes pxy (R S)=
Conventional wisdom: do projects early Example: relation R(A,B,C,D,E) predicate P: (A=3) (B=“cat”) pE {sp(R)} vs. pE {sp{pABE(R)}}
What if we have A, B indexes? But B = “cat” A=3 Intersect pointers to get pointers to matching tuples! Then better to do projection later !
Which are “good” transformations? sp1p2 (R) sp1 [sp2 (R)] sp (R S) [sp (R)] S R S S R px [sp(R)] px {sp [pxz(R)]}
Bottom line: • Some heuristics : • Early selection is usually good • No transformation is always good • Rule application defines a search space • Need cost criteria to make decision
In textbook: more transformations • Chapter 16.2, 16.3.3 • More rewrite rules • Other operations, such as, duplicate elimination, etc.