1 / 98

Lecture 7: Query Execution

Lecture 7: Query Execution. Wednesday, November 9, 2011. Outline. Relational Algebra Read: Ch. 4.2; “Three query languages formalisms” (lecture 1) Overview of query evaluation Read: Ch. 12 Evaluating relational operators: Read: Ch. 14, Shapiro’s Grace Join paper. Reading for Next Lectures.

taji
Download Presentation

Lecture 7: Query Execution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 7:Query Execution Wednesday, November 9, 2011 Dan Suciu -- CSEP544 Fall 2011

  2. Outline • Relational Algebra • Read: Ch. 4.2; “Three query languages formalisms” (lecture 1) • Overview of query evaluation • Read: Ch. 12 • Evaluating relational operators: • Read: Ch. 14, Shapiro’s Grace Join paper Dan Suciu -- CSEP544 Fall 2011

  3. Reading for Next Lectures • Short, optional reading for next lecture • Chaudhuri on query optimization • Lots of papers to read for the following two lectures (and more may be added); start early! Dan Suciu -- CSEP544 Fall 2011

  4. 1. Relational Algebra (Ch. 4.2) Dan Suciu -- CSEP544 Fall 2011

  5. The WHAT and the HOW • In SQL we write WHAT we want to get form the data • The database system needs to figure out HOW to get the data we want • The passage from WHAT to HOW goes through the Relational Algebra Physical Data Independence

  6. SQL = WHAT Product(pid, name, price)Purchase(pid, cid, store) Customer(cid, name, city) SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = y.cid andx.price > 100 and z.city = ‘Seattle’ It’s clear WHAT we want, unclear HOW to get it

  7. Relational Algebra = HOW Final answer Product(pid, name, price)Purchase(pid, cid, store) Customer(cid, name, city) δ T4(name,name) Π x.name,z.name T3(. . . ) T2( . . . .) σ price>100 and city=‘Seattle’ T1(pid,name,price,pid,cid,store) cid=cid Temporary tablesT1, T2, . . . pid=pid Customer Purchase Product

  8. Relational Algebra = HOW The order is now clearly specified: Iterate over PRODUCT… …join with PURCHASE… …join with CUSTOMER… …select tuples with Price>100 and City=‘Seattle’… …eliminate duplicates… …and that’s the final answer ! Dan Suciu -- CSEP544 Fall 2011

  9. Sets v.s. Bags • Sets: {a,b,c}, {a,d,e,f}, { }, . . . • Bags: {a, a, b, c}, {b, b, b, b, b}, . . . Relational Algebra has two semantics: • Set semantics (paper “Three languages…”) • Bag semantics Dan Suciu -- CSEP544 Fall 2011

  10. Extended Algebra Operators • Union , intersection , difference - • Selection s • ProjectionΠ • Join ⨝ • Rename  • Duplicate elimination d • Grouping and aggregation g • Sorting t Dan Suciu -- CSEP544 Fall 2011

  11. Union and Difference R1  R2 R1 – R2 What do they mean over bags ? Dan Suciu -- CSEP544 Fall 2011

  12. What about Intersection ? • Derived operator using minus • Derived using join (will explain later) R1  R2 = R1 – (R1 – R2) R1  R2 = R1 ⨝ R2 Dan Suciu -- CSEP544 Fall 2011

  13. Selection • Returns all tuples which satisfy a condition • Examples • sSalary > 40000(Employee) • sname = “Smith”(Employee) • The condition c can be =, <, , >,, <> sc(R) Dan Suciu -- CSEP544 Fall 2011

  14. Employee sSalary > 40000(Employee) Dan Suciu -- CSEP544 Fall 2011

  15. Projection • Eliminates columns • Example: project social-security number and names: • PSSN, Name (Employee) • Answer(SSN, Name) P A1,…,An(R) Semantics differs over set or over bags Dan Suciu -- CSEP544 Fall 2011

  16. Employee PName,Salary (Employee) Set semantics Bag semantics Which is more efficient?

  17. Cartesian Product • Each tuple in R1 with each tuple in R2 • Very rare in practice; mainly used to express joins R1  R2 Dan Suciu -- CSEP544 Fall 2011

  18. Employee Dependent Employee ✕ Dependent Dan Suciu -- CSEP544 Fall 2011

  19. Renaming • Recall: needed in the named perspective • Changes the schema, not the instance • Example: • rN, S(Employee)  Answer(N, S) rB1,…,Bn (R) Dan Suciu -- CSEP544 Fall 2011

  20. Natural Join R1 ⨝R2 • Meaning: R1⨝R2 = PA(s(R1 × R2)) • Where: • The selection schecks equality of all common attributes • The projection eliminates the duplicate common attributes Dan Suciu -- CSEP544 Fall 2011

  21. Natural Join R S R ⨝ S= PABC(sR.B=S.B(R × S)) Dan Suciu -- CSEP544 Fall 2011

  22. Natural Join • Given schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⨝ S ? • Given R(A, B, C), S(D, E), what is R ⨝ S ? • Given R(A, B), S(A, B), what is R ⨝ S ? Dan Suciu -- CSEP544 Fall 2011

  23. Theta Join • A join that involves a predicate • Here q can be any condition R1 ⨝q R2 = sq (R1  R2) Dan Suciu -- CSEP544 Fall 2011

  24. Eq-join • A theta join where q is an equality • This is by far the most used variant of join in practice R1 ⨝A=B R2 = sA=B (R1  R2) Dan Suciu -- CSEP544 Fall 2011

  25. So Which Join Is It ? • When we write R ⨝ S we usually mean an eq-join, but we often omit the equality predicate when it is clear from the context Dan Suciu -- CSEP544 Fall 2011

  26. Semijoin • Where A1, …, An are the attributes in R R ⋉C S = PA1,…,An (R ⨝C S) • Formally, R ⋉C S means this: retain from R only thosetuples that have some matching tuple in S • Duplicates in R are preserved • Duplicates in S don’t matter

  27. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) • Assumptions • Very few dependents have age > 71. • “Stuff” is big. Task: compute the query with minimum amount of data transfer

  28. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) T = PEmpSSNs age>71 (Dependents) Dan Suciu -- CSEP544 Fall 2011

  29. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) T = PEmpSSNs age>71 (Dependents) R = Employee⨝SSN=EmpSSNT = Employee ⋉SSN=EmpSSN (sage>71 (Dependents))

  30. Semijoins in Distributed Databases Dependent Employee network Employee ⨝SSN=EmpSSN (sage>71 (Dependent)) T = PEmpSSNs age>71 (Dependents) R = Employee ⋉SSN=EmpSSNT Answer = R ⨝SSN=EmpSSNs age>71 Dependents

  31. Joins R US • The join operation in all its variants (eq-join, natural join, semi-join, outer-join) is at the heart of relational database systems • WHY ? Dan Suciu -- CSEP544 Fall 2011

  32. Operators on Bags • Duplicate elimination d d(R) = SELECT DISTINCT * FROM R • Grouping g gA,sum(B) (R) = SELECTA,sum(B) FROM R GROUP BY A • Sorting t

  33. y.seller-id=z.id y.pid=u.pid x.id=z.id Complex RA Expressions gu.name, count(*) Product(pid, name, price)Purchase(pid, id, seller-id) Person(id, name) Person x Purchase y Person z Product u SELECT u.name, count(*)FROM Person x, Purchase y, Person z, Product uWHERE z.name=‘fred’ and u.name=‘gizmo’ and y.seller-id = z.id and y.pid = u.pid and x.id=z.idGROUP BY u.name Pid Ppid sname=fred sname=gizmo

  34. RA = Dataflow Program • Several operations, plus strictly specified order • In RDBMS the dataflow graph is always a tree • Novel languages (s.a. PIG), dataflow graph may be a DAG

  35. Limitations of RA • Cannot compute “transitive closure” • Find all direct and indirect relatives of Fred • Cannot express in RA !!! Need to write Java program • Remember the Bacon number ? Needs TC too ! Dan Suciu -- CSEP544 Fall 2011

  36. 2. Overview of query evaluation Dan Suciu -- CSEP544 Fall 2011

  37. Steps of the Query Processor SQL query Parse & Rewrite Query Logicalplan Select Logical Plan Queryoptimization Select Physical Plan Physicalplan Query Execution Disk

  38. Example Database Schema Supplier(sno, sname, scity, sstate) • Supply(sno, pno, price) Part(pno, pname, psize, pcolor) View: Suppliers in Seattle CREATE VIEW NearbySupp AS SELECT x.sno, x.sname FROM Supplier x WHERE x.scity='Seattle' AND x.sstate='WA' Dan Suciu -- CSEP544 Fall 2011

  39. Example Query Supplier(sno, sname, scity, sstate) • Supply(sno, pno, price) Part(pno, pname, psize, pcolor) Find the names of all suppliers in Seattle who supply part number 2 NearbySupp(sno, sname) SELECT y.sname FROM NearbySuppy WHERE y.sno IN ( SELECT z.sno FROM Supplies z WHERE z.pno = 2 ) Dan Suciu -- CSEP544 Fall 2011

  40. Steps in Query Evaluation • Step 0: Admission control • User connects to the db with username, password • User sends query in text format • Step 1: Query parsing • Parses query into an internal format • Performs various checks using catalog • Correctness, authorization, integrity constraints • Step 2: Query rewrite • View rewriting, flattening, etc. Dan Suciu -- CSEP544 Fall 2011

  41. Rewritten Version of Our Query Supplier(sno, sname, scity, sstate) • Supply(sno, pno, price) Part(pno, pname, psize, pcolor) NearbySupp(sno, sname) SELECT y.sname FROM NearbySuppy WHERE y.sno IN ( SELECT z.sno FROM Supplies z WHERE z.pno = 2 ) Original query: SELECT x.sname FROM Supplier x, Supplies z WHERE x.scity='Seattle' ANDx.sstate='WA’ AND x.sno = z.sno AND z.pno = 2; Rewritten query:

  42. Continue with Query Evaluation • Step 3: Query optimization • Find an efficient query plan for executing the query • A query plan is • Logical query plan: an extended relational algebra tree • Physical query plan: with additional annotations at each node • Access method to use for each relation • Implementation to use for each relational operator Dan Suciu -- CSEP544 Fall 2011

  43. Logical Query Plan Supplier(sno, sname, scity, sstate) • Supply(sno, pno, price) Part(pno, pname, psize, pcolor) Πsname NearbySupp(sno, sname) sscity=‘Seattle’ sstate=‘WA’  pno=2 sno = sno Suppliers Supplies Dan Suciu -- CSEP544 Fall 2011

  44. Physical Query Plan • Logical query plan with extra annotations • Access path selection for each relation • Use a file scan or use an index • Implementation choice for each operator • Scheduling decisions for operators Dan Suciu -- CSEP544 Fall 2011

  45. Supplier(sno, sname, scity, sstate) • Supply(sno, pno, price) Part(pno, pname, psize, pcolor) Physical Query Plan (On the fly) sname (On the fly) sscity=‘Seattle’ sstate=‘WA’  pno=2 (Nested loop) sno = sno Suppliers Supplies (File scan) (File scan)

  46. Final Step in Query Processing • Step 4: Query execution • How to synchronize operators? • How to pass data between operators? • Synchronization techniques: • Pipelined execution • Materialized relations for intermediate results • Passing data between operators: • Iterator interface • One thread per operator Dan Suciu -- CSEP544 Fall 2011

  47. Iterator Interface • Each operator implements this interface • Interface has only three methods • open() • Initializes operator state • Sets parameters such as selection condition • get_next() • Operator invokes get_next() recursively on its inputs • Performs processing and produces an output tuple • close(): cleans-up state Dan Suciu -- CSEP544 Fall 2011

  48. Supplier(sno, sname, scity, sstate) • Supply(sno, pno, price) Part(pno, pname, psize, pcolor) Pipelined Execution (On the fly) sname (On the fly) sscity=‘Seattle’ sstate=‘WA’  pno=2 (Nested loop) sno = sno Suppliers Supplies (File scan) (File scan)

  49. Pipelined Execution • Applies parent operator to tuples directly as they are produced by child operators • Benefits • No operator synchronization issues • Saves cost of writing intermediate data to disk • Saves cost of reading intermediate data from disk • Good resource utilizations on single processor • This approach is used whenever possible Dan Suciu -- CSEP544 Fall 2011

  50. Supplier(sno, sname, scity, sstate) • Supply(sno, pno, price) Part(pno, pname, psize, pcolor) Intermediate TupleMaterialization sname (On the fly) (Sort-merge join) sno = sno (Scan: write to T1) (Scan: write to T2) sscity=‘Seattle’ sstate=‘WA’  pno=2 Suppliers Supplies (File scan) (File scan)

More Related