1 / 106

CSE544 Query Execution

CSE544 Query Execution. Thursday, February 2 nd , 2011. Outline. Relational Algebra: Ch. 4.2 Overview of query evaluation: Ch. 12 Evaluating relational operators: Ch. 14 Shapiro’s paper. The WHAT and the HOW. In SQL we write WHAT we want to get form the data

jon
Download Presentation

CSE544 Query Execution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE544Query Execution Thursday, February 2nd, 2011 Dan Suciu -- 544, Winter 2011

  2. Outline • Relational Algebra: Ch. 4.2 • Overview of query evaluation: Ch. 12 • Evaluating relational operators: Ch. 14 • Shapiro’s paper Dan Suciu -- 544, Winter 2011

  3. The WHAT and the HOW • In SQL we write WHAT we want to get form the data • The database system needs to figure out HOW to get the data we want • The passage from WHAT to HOW goes through the Relational Algebra Physical Data Independence Dan Suciu -- 544, Winter 2011

  4. SQL = WHAT Product(pid, name, price)Purchase(pid, cid, store) Customer(cid, name, city) SELECTDISTINCTx.name, z.name FROM Product x, Purchase y, Customer z WHEREx.pid = y.pid and y.cid = y.cid andx.price > 100 and z.city = ‘Seattle’ It’s clear WHAT we want, unclear HOW to get it Dan Suciu -- 544, Winter 2011

  5. Relational Algebra = HOW Final answer Product(pid, name, price)Purchase(pid, cid, store) Customer(cid, name, city) δ T4(name,name) Π x.name,z.name T3(. . . ) T2( . . . .) σ price>100 and city=‘Seattle’ T1(pid,name,price,pid,cid,store) cid=cid Temporary tablesT1, T2, . . . pid=pid Customer Purchase Product

  6. Relational Algebra = HOW The order is now clearly specified: Iterate over PRODUCT… …join with PURCHASE… …join with CUSTOMER… …select tuples with Price>100 and City=‘Seattle’… …eliminate duplicates… …and that’s the final answer ! Dan Suciu -- 544, Winter 2011

  7. Sets v.s. Bags • Sets: {a,b,c}, {a,d,e,f}, { }, . . . • Bags: {a, a, b, c}, {b, b, b, b, b}, . . . Relational Algebra has two semantics: • Set semantics • Bag semantics Dan Suciu -- 544, Winter 2011

  8. Extended Algebra Operators • Union , intersection , difference - • Selection s • ProjectionΠ • Join ⨝ • Rename  • Duplicate elimination d • Grouping and aggregation g • Sorting t Dan Suciu -- 544, Winter 2011

  9. Relational Algebra (1/3) The Basic Five operators: • Union:  • Difference: - • Selection:s • Projection: P • Join: ⨝ Dan Suciu -- 544, Winter 2011

  10. Relational Algebra (2/3) Derived or auxiliary operators: • Renaming: ρ • Intersection, complement • Variations of joins • natural, equi-join, theta join, semi-join, cartesian product Dan Suciu -- 544, Winter 2011

  11. Relational Algebra (3/3) Extensions for bags: • Duplicate elimination: δ • Group by: γ • Sorting: τ Dan Suciu -- 544, Winter 2011

  12. Union and Difference R1  R2 R1 – R2 What do they mean over bags ? Dan Suciu -- 544, Winter 2011

  13. What about Intersection ? • Derived operator using minus • Derived using join (will explain later) R1  R2 = R1 – (R1 – R2) R1  R2 = R1 ⨝ R2 Dan Suciu -- 544, Winter 2011

  14. Selection • Returns all tuples which satisfy a condition • Examples • sSalary > 40000(Employee) • sname = “Smith”(Employee) • The condition c can be =, <, , >,, <> sc(R) Dan Suciu -- 544, Winter 2011

  15. Employee sSalary > 40000(Employee) Dan Suciu -- 544, Winter 2011

  16. Projection • Eliminates columns • Example: project social-security number and names: • PSSN, Name (Employee) • Answer(SSN, Name) P A1,…,An(R) Semantics differs over set or over bags Dan Suciu -- 544, Winter 2011

  17. Employee PName,Salary (Employee) Set semantics Bag semantics Which is more efficient to implement ? Dan Suciu -- 544, Winter 2011

  18. Cartesian Product • Each tuple in R1 with each tuple in R2 • Very rare in practice; mainly used to express joins R1  R2 Dan Suciu -- 544, Winter 2011

  19. Employee Dependent Employee ✕ Dependent Dan Suciu -- 544, Winter 2011

  20. Renaming • Changes the schema, not the instance • Example: • rN, S(Employee)  Answer(N, S) rB1,…,Bn (R) Dan Suciu -- 544, Winter 2011

  21. Natural Join R1 ⨝R2 • Meaning: R1⨝R2 = PA(s(R1 × R2)) • Where: • The selection schecks equality of all common attributes • The projection eliminates the duplicate common attributes Dan Suciu -- 544, Winter 2011

  22. Natural Join R S R ⨝ S= PABC(sR.B=S.B(R × S)) Dan Suciu -- 544, Winter 2011

  23. Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⨝ S ? • Given R(A, B, C), S(D, E), what is R ⨝ S ? • Given R(A, B), S(A, B), what is R ⨝ S ? Dan Suciu -- 544, Winter 2011

  24. Theta Join • A join that involves a predicate • Here q can be any condition R1 ⨝q R2 = sq (R1  R2) Dan Suciu -- 544, Winter 2011

  25. Eq-join • A theta join where q is an equality • This is by far the most used variant of join in practice R1 ⨝A=B R2 = sA=B (R1  R2) Dan Suciu -- 544, Winter 2011

  26. So Which Join Is It ? • When we write R ⨝ S we usually mean an eq-join, but we often omit the equality predicate when it is clear from the context Dan Suciu -- 544, Winter 2011

  27. Semijoin • Where A1, …, An are the attributes in R R ⋉C S = PA1,…,An (R ⨝C S) • Formally, R ⋉C S means this: retain from R only thosetuples that have some matching tuple in S • Duplicates in R are preserved • Duplicates in S don’t matter Dan Suciu -- 544, Winter 2011

  28. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) Assumptions: Very few Employees have dependents. Very few dependents have age > 71. “Stuff” is big. Task: compute the query with minimum amount of data transfer

  29. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) T(SSN) = PSSNsage>71 (Dependents)

  30. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) T(SSN) = PSSNsage>71 (Dependents) R = Employee⨝SSN=EmpSSNT = Employee ⋉SSN=EmpSSN (sage>71 (Dependents))

  31. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) T(SSN) = PSSNsage>71 (Dependents) R = Employee ⋉SSN=EmpSSNT Answer = R ⨝SSN=EmpSSNDependents

  32. Joins R US • The join operation in all its variants (eq-join, natural join, semi-join, outer-join) is at the heart of relational database systems • WHY ? Dan Suciu -- 544, Winter 2011

  33. Operators on Bags • Duplicate elimination d d(R) = select distinct * from R • Grouping g gA,sum(B) (R) = select A,sum(B) from R group by A • Sorting t Dan Suciu -- 544, Winter 2011

  34. y.seller-ssn=z.ssn y.pid=u.pid x.ssn=y.buyer-ssn Complex RA Expressions gu.name, count(*) Person x Purchase y Person z Product u Pssn Ppid sname=fred sname=gizmo Dan Suciu -- 544, Winter 2011

  35. RA = Dataflow Program • Several operations, plus strictly specified order • In RDBMS the dataflow graph is always a tree • Novel applications (s.a. PIG), dataflow graph may be a DAG Dan Suciu -- 544, Winter 2011

  36. Limitations of RA • Cannot compute “transitive closure” • Find all direct and indirect relatives of Fred • Cannot express in RA !!! Need to write Java program • Remember the Bacon number ? Needs TC too ! Dan Suciu -- 544, Winter 2011

  37. Steps of the Query Processor SQL query Parse & Rewrite Query Logicalplan Select Logical Plan Queryoptimization Select Physical Plan Physicalplan Query Execution Disk

  38. Example Database Schema Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) View: Suppliers in Seattle CREATE VIEW NearbySupp AS SELECT sno, sname FROM Supplier WHERE scity='Seattle' AND sstate='WA' Dan Suciu -- 544, Winter 2011

  39. Example Query Find the names of all suppliers in Seattle who supply part number 2 SELECT sname FROM NearbySupp WHERE sno IN ( SELECT sno FROM Supplies WHERE pno = 2 ) Dan Suciu -- 544, Winter 2011

  40. Steps in Query Evaluation • Step 0: Admission control • User connects to the db with username, password • User sends query in text format • Step 1: Query parsing • Parses query into an internal format • Performs various checks using catalog • Correctness, authorization, integrity constraints • Step 2: Query rewrite • View rewriting, flattening, etc. Dan Suciu -- 544, Winter 2011

  41. Rewritten Version of Our Query Original query: Rewritten query: SELECT sname FROM NearbySupp WHERE sno IN ( SELECT sno FROM Supplies WHERE pno = 2 ) SELECT S.sname FROM Supplier S, Supplies U WHERE S.scity='Seattle' AND S.sstate='WA’ AND S.sno = U.sno AND U.pno = 2; Dan Suciu -- 544, Winter 2011

  42. Continue with Query Evaluation • Step 3: Query optimization • Find an efficient query plan for executing the query • A query plan is • Logical query plan: an extended relational algebra tree • Physical query plan: with additional annotations at each node • Access method to use for each relation • Implementation to use for each relational operator Dan Suciu -- 544, Winter 2011

  43. Extended Algebra Operators • Union , intersection , difference - • Selection s • Projection  • Join ⨝ • Duplicate elimination d • Grouping and aggregation g • Sorting t • Rename  Dan Suciu -- 544, Winter 2011

  44. Logical Query Plan Πsname sscity=‘Seattle’ sstate=‘WA’  pno=2 sno = sno Suppliers Supplies Dan Suciu -- 544, Winter 2011

  45. Query Block • Most optimizers operate on individual query blocks • A query block is an SQL query with no nesting • Exactly one • SELECT clause • FROM clause • At most one • WHERE clause • GROUP BY clause • HAVING clause Dan Suciu -- 544, Winter 2011

  46. Typical Plan for Block (1/2)  fields selection condition SELECT-PROJECT-JOIN Query join condition … join condition R S Dan Suciu -- 544, Winter 2011

  47. Typical Plan For Block (2/2) σhaving-ondition fields, sum/count/min/max(fields) selection condition join condition … … Dan Suciu -- 544, Winter 2011

  48. Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? SELECTQ.sno FROM Supplier Q WHEREQ.sstate = ‘WA’ and not exists SELECT * FROM Supply PWHEREP.sno = Q.sno and P.price > 100 Dan Suciu -- 544, Winter 2011

  49. Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? SELECTQ.sno FROM Supplier Q WHEREQ.sstate = ‘WA’ and notexists SELECT * FROM Supply PWHEREP.sno = Q.sno and P.price > 100 Correlation ! Dan Suciu -- 544, Winter 2011

  50. Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) How about Subqueries? SELECTQ.sno FROM Supplier Q WHEREQ.sstate = ‘WA’ and not exists SELECT * FROM Supply PWHEREP.sno = Q.sno and P.price > 100 De-Correlation SELECTQ.sno FROM Supplier Q WHEREQ.sstate = ‘WA’ andQ.snonotin SELECTP.sno FROM Supply PWHEREP.price > 100 Dan Suciu -- 544, Winter 2011

More Related