1 / 83

CS 245: Database System Principles

Learn about query processing in relational database systems, including query plans, relational algebra, optimization, and SQL parsing. Understand logical query planning, cost estimation, physical plans, and query optimization techniques.

Download Presentation

CS 245: Database System Principles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina Notes 6

  2. Focus: Relational System • Others? Query Processing Q  Query Plan Notes 6

  3. Example Select B,D From R,S Where R.A = “c”  S.E = 2  R.C=S.C Notes 6

  4. Answer B D 2 x R A B C S C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 Notes 6

  5. How do we execute query? - Do Cartesian product - Select tuples - Do projection One idea Notes 6

  6. Bingo! Got one... RXS R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . . C 2 10 10 x 2 . . Notes 6

  7. Relational Algebra - can be used to describe plans... Ex: Plan I B,D sR.A=“c” S.E=2  R.C=S.C X R S OR: B,D [sR.A=“c” S.E=2  R.C = S.C (RXS)] Notes 6

  8. Another idea: Plan II B,D sR.A = “c”sS.E = 2 R S natural join Notes 6

  9. R S A B C s (R) s(S) C D E a 1 10 A B C C D E 10 x 2 b 1 20 c 2 10 10 x 2 20 y 2 c 2 10 20 y 2 30 z 2 d 2 35 30 z 2 40 x 1 e 3 45 50 y 3 Notes 6

  10. Plan III Use R.A and S.C Indexes (1) Use R.A index to select R tuples with R.A = “c” (2) For each R.C value found, use S.C index to find matching tuples (3) Eliminate S tuples S.E  2 (4) Join matching R,S tuples, project B,D attributes and place in result Notes 6

  11. =“c” <c,2,10> <10,x,2> check=2? output: <2,x> next tuple: <c,7,15> R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 Notes 6

  12. Overview of Query Optimization Notes 6

  13. SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..} Notes 6

  14. Example: SQL query SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); (Find the movies with stars born in 1960) Notes 6

  15. Example: Parse Tree <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> title StarsIn <Attribute> ( <Query> ) starName <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthDate ‘%1960’ Notes 6

  16. Example: Generating Relational Algebra title  StarsIn <condition> <tuple> IN name <attribute> birthdate LIKE ‘%1960’ starName MovieStar Fig. 7.15: An expression using a two-argument , midway between a parse tree and relational algebra Notes 6

  17. Example: Logical Query Plan title starName=name  StarsIn name birthdate LIKE ‘%1960’ MovieStar Fig. 7.18: Applying the rule for IN conditions Notes 6

  18. Example: Improved Logical Query Plan title Question: Push project to StarsIn? starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar Fig. 7.20: An improvement on fig. 7.18. Notes 6

  19. Example: Estimate Result Sizes Need expected size StarsIn MovieStar P s Notes 6

  20. Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scan index scan Parameters: Select Condition,... StarsIn MovieStar Notes 6

  21. Example: Estimate costs L.Q.P P1 P2 …. Pn C1 C2 …. Cn Pick best! Notes 6

  22. Textbook outline Chapter 6 6.1 Algebra for queries [bags vs sets] - Select, project, join, …. [project list a,a+b->x,…] - Duplicate elimination, grouping, sorting 6.2 Physical operators - Scan,sort, … 6.3-6.10 Implementing operators + estimating their cost [Ch 5] [15.1] [15.2-15.9] Notes 6

  23. Chapter 7 7.1[16.1] Parsing 7.2[16.2] Algebraic laws 7.3[16.3] Parse tree -> logical query plan 7.4[16.4] Estimating result sizes 7.5-7[16.5-7] Cost based optimization Notes 6

  24. Reading textbook - Chapters 6,7 Optional: 6.8, 6.9, 6.10, 7.6, 7.7 [15.7, 15.8, 15.9, 16.6, 16.7] Optional: Duplicate elimination operator grouping, aggregation operators Notes 6

  25. Query Optimization - In class order • Relational algebra level • Detailed query plan level • Estimate Costs • without indexes • with indexes • Generate and compare plans Notes 6

  26. Relational algebra optimization • Transformation rules (preserve equivalence) • What are good transformations? Notes 6

  27. Rules:Natural joins & cross products & union R S = S R (R S) T = R (S T) Notes 6

  28. Note: • Carry attribute names in results, so order is not important • Can also write as trees, e.g.: T R R S S T Notes 6

  29. Rules:Natural joins & cross products & union R S = S R (R S) T = R (S T) R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T Notes 6

  30. Rules: Selects sp1p2(R) = sp1vp2(R) = sp1 [ sp2 (R)] [ sp1 (R)] U [ sp2 (R)] Notes 6

  31. Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} RUS = ? • Option 1 SUM RUS = {a,a,b,b,b,b,b,c,c,c,d} • Option 2 MAX RUS = {a,a,b,b,b,c,c,d} Notes 6

  32. Option 2 (MAX) makes this rule work:sp1vp2 (R) = sp1(R) U sp2(R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c sp1vp2 (R) = {a,a,b,b,b,c}sp1(R) = {a,a,b,b,b}sp2(R) = {b,b,b,c}sp1(R) U sp2 (R) = {a,a,b,b,b,c} Notes 6

  33. “Sum” option makes more sense: Senators (……) Rep (……) T1 = pyr,state Senators; T2 = pyr,state Reps T1 Yr State T2 Yr State 97 CA 99 CA 99 CA 99 CA 98 AZ 98 CA Union? Notes 6

  34. Executive Decision -> Use “SUM” option for bag unions -> Some rules cannot be used for bags Notes 6

  35. Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y pxy (R) = px [py (R)] Notes 6

  36. [sp (R)] S R [sq (S)] Rules:s + combined Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs sp (R S) = sq (R S) = Notes 6

  37. Rules:s + combined (continued) Some Rules can be Derived: spq (R S) = spqm (R S) = spvq (R S) = Notes 6

  38. Do one, others for homework: spq (R S) = [sp (R)] [sq (S)] spqm (R S) = sm[(sp R) (sq S)] spvq (R S) = [(sp R) S] U [R(sq S)] Notes 6

  39. --> Derivation for first one: spq (R S) = sp [sq (R S) ] = sp[ R sq (S) ] = [sp (R)] [sq (S)] Notes 6

  40. pxz px Rules:p,s combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) px[sp (R) ] = {sp [ px (R) ]} Notes 6

  41. pxy{[pxz (R) ][pyz (S) ]} Rules:p, combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes pxy (R S)= Notes 6

  42. pxy {sp[pxz’ (R) pyz’ (S)]} z’ = z U {attributes used in P } pxy {sp(R S)} = Notes 6

  43. Rules for s,p combined with X similar... e.g., sp (R X S) = ? Notes 6

  44. Rules s,U combined: sp(R U S)= sp(R) U sp(S) sp(R - S)= sp(R) - S = sp(R) - sp(S) Notes 6

  45. Which are “good” transformations? sp1p2 (R) sp1 [sp2 (R)] sp (R S)  [sp (R)] S R S  S R px [sp(R)] px {sp [pxz(R)]} Notes 6

  46. Conventional wisdom: do projects early Example: R(A,B,C,D,E) x={E} P: (A=3)  (B=“cat”) px {sp(R)} vs. pE {sp{pABE(R)}} Notes 6

  47. What if we have A, B indexes? But B = “cat” A=3 Intersect pointers to get pointers to matching tuples Notes 6

  48. Bottom line: • No transformation is always good • Usually good: early selections Notes 6

  49. In textbook: more transformations • Eliminate common sub-expressions • Other operations: duplicate elimination Notes 6

  50. Outline - Query Processing • Relational algebra level • transformations • good transformations • Detailed query plan level • estimate costs • generate and compare plans Notes 6

More Related