1 / 36

CS4432: Database Systems II

CS4432: Database Systems II. Query Operator & Algebraic Expressions. Why SQL. SQL is a very-high-level language. Say “ what to do ” rather than “ how to do it. ” Avoid a lot of data-manipulation details needed in procedural languages like C++ or Java.

frankw
Download Presentation

CS4432: Database Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4432: Database Systems II Query Operator & Algebraic Expressions

  2. Why SQL • SQL is a very-high-level language. • Say “what to do” rather than “how to do it.” • Avoid a lot of data-manipulation details needed in procedural languages like C++ or Java. • Database management system figures out “best” way to execute query. • Called “query optimization.”

  3. Query Processing SQL Query SELECT pNumber, count(*) AS CNT FROM Student WHERE sNumber > 1 GROUP BY pNumber; Query Plans

  4. Query Example SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C

  5. How do we execute query? - Form Cartesian product of all tables in FROM-clause - Select tuples that match WHERE-clause - Project columns that occur in SELECT-clause One idea

  6. Bingo! Got one... R X S R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . . C 2 10 10 x 2 . . SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C

  7. But ? • Performance would be unacceptable! • We need a better approach for reasoning about queries, their execution orders and their respective costs

  8. Formal Relational Query Languages • Relational Algebra: More operational, very useful for representing execution plans. • Operators working on relations

  9. Core Relational Algebra (Recap) • Union, intersection, and difference. • Usual set operations, both operands have the same relation schema. • Selection: picking certain rows. • Projection: picking certain columns. • Productsand joins: compositions of relations. • Renamingof relations and attributes. • Groupingand Aggregation: Grouping matching tuples • Duplicate Elimination: eliminates identical copies except one • Sorting: Orders tuples based on a given criteria

  10. Relational Algebra Express Query Plans B,D sR.A=“c” S.E=2  R.C=S.C X R S

  11. Recap on Relational Algebra & Operators

  12. Algebra Behind the Query Language Relational Algebra • Set of operators that operate on relations • Operator semantics based on Setor Bagtheory • Relational algebra form underlying basis (and optimization rules) for SQL SELECT pNumber, count(*) AS CNT FROM Student WHERE sNumber > 1 GROUP BY pNumber;

  13. Relational Algebra • Basic operators • Set Operations (Union: ∪, Intersection: ∩ ,difference: – ) • Select: σ • Project: π • Cartesian product: x • rename: ρ • More advanced operators, e.g., grouping and joins • The operators take one or two relations as inputs and produce a new relation as an output • One input  unary operator, two inputs  binary operator

  14. Union over sets:  Binary Op. • Consider two relations R and S that are union-compatible (same schema) R S R S

  15. Difference over sets: – Binary Op. • R – S are the tuples that appear in R and not in S • R & S must be union-compatible • Defined as: • R – S = {t | t ∈R and t∈ S} R-S ≠ S-R R S R – S

  16. Intersection over sets: ∩ Binary Op. • Consider two Relations R and S that are union-compatible R S R∩S

  17. Selection: σ Unary Op. • Select: σc (R): • c is a condition on R’s attributes • Select subset of tuples from R that satisfy selection condition c σ(C ≥ 6) (R) R

  18. Selection: Example σ ((A=B) ^ (D>5))(R) R σ (D > C)(R)

  19. Project: π Unary Op. • πA1, A2, …, An (R), with A1, A2, …, An  attributes AR • returns all tuples in R, but only columns A1, A2, …, An • A1, A2, …, An are called Projection List πA, C (R) R

  20. Extended Projection: πL (R)Example Compute this expression and call it X Rename column A to V π C, VA, X C*3+B(R) R

  21. Cross Product (Cartesian Product): X Binary Op. • Each tuple in R joined with each tuple is S • R x S = {t q | t ∈ R and q ∈ S} R X S S R

  22. Natural Join: R ⋈ S Binary Op. • An implicit equality condition on the common columns Implicit condition (R.B = S.B and R.D = S.D) S R R ⋈ S

  23. Theta Join: R ⋈C S Binary Op. • A join based on any arbitrary condition C • It is defined as : R ⋈C S = (σC (R X S)) Recommendation: Always use Theta join (more explicit and more clear) R S R ⋈R.A>=S.CS

  24. Duplicate Elimination:  (R) Unary Op. • Delete all duplicate records • Convert a Bag(allows duplicates) to a Set(does not allow duplicates) R  (R)

  25. Grouping & Aggregation operator:  Unary Op. • Grouing & Aggregate operation in relational algebra g1,g2, …gm,F1(A1), F2(A2), …Fn(An) (R) Aggregation functions applied over each group Group by these columns (can be empty) avg: average value min: minimum value max: maximum value sum: sum of values count: number of values

  26. Grouping & Aggregation Operator: Example S R • branch_name,sum(balance)(S) • sum(c)(R)

  27. Assignment Operator:  • Write query as a sequence of line consisting of: • Series of assignments • Result expression containing the final answer • May use a variable multiple times in subsequent expressions • Example: • R1  (σ ((A=B) ^ (D>5))(R – S)) ∩ W • R2  R1 ⋈(R.A = T.C)T • Result  R1 U R2

  28. Banking Example • branch (branch_name, branch_city, assets) • customer (customer_name, customer_street, customer_city) • account (account_number, branch_name, balance) • loan (loan_number, branch_name, amount) • depositor (customer_name, account_number) • borrower (customer_name, loan_number)

  29. Example Queries Find customer names having account balance below 100 or above 10,000 • πcustomer_name (depositor ⋈ • πaccount_number(σbalance <100 OR balance > 10,000(account)))

  30. Example Queries

  31. Example Queries (Cont’d)

  32. Example Queries Find customers’ names who have neither accounts nor loans • πcustomer_name(customer) - • (πcustomer_name(borrower) U πcustomer_name(depositer))

  33. Example Queries For branches that gave loans > 100,000 or hold accounts with balances >50,000, report the branch name along whether it is reported because of a loan or an account • R1  πbranch_name, ‘Loan’ As Type (σamount >100,000 (loan)) • R2  πbranch_name, ‘Account’ As Type(σbalance > 50,000 (account))) • Result  R1 U R2

  34. Example Queries Find customer names having loans with sum > 20,000 • πcustomer_name (σsum > 20,000 (customer_name, sum sum(amount)(loan ⋈ borrower)))

  35. Example Queries Find the branch name with the largest number of accounts • R1  branch_name, countAccounts count(account_number)(account) • R2  Max  max(countAccounts)(R1) • Result  πbranch_name(R1 ⋈countAccounts = Max R2)

  36. Summary of Relational-Algebra Operators • Set operators • Union, Intersection, Difference • Selection & Projection & Extended Projection • Joins • Natural, Theta, Outer join • Rename & Assignment • Duplicate elimination • Grouping & Aggregation

More Related