360 likes | 373 Views
Learn how relational algebra forms the basis for SQL query optimization and execution, with detailed explanations of operators and relational algebra expressions.
E N D
CS4432: Database Systems II Query Operator & Algebraic Expressions
Why SQL • SQL is a very-high-level language. • Say “what to do” rather than “how to do it.” • Avoid a lot of data-manipulation details needed in procedural languages like C++ or Java. • Database management system figures out “best” way to execute query. • Called “query optimization.”
Query Processing SQL Query SELECT pNumber, count(*) AS CNT FROM Student WHERE sNumber > 1 GROUP BY pNumber; Query Plans
Query Example SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C
How do we execute query? - Form Cartesian product of all tables in FROM-clause - Select tuples that match WHERE-clause - Project columns that occur in SELECT-clause One idea
Bingo! Got one... R X S R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . . C 2 10 10 x 2 . . SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C
But ? • Performance would be unacceptable! • We need a better approach for reasoning about queries, their execution orders and their respective costs
Formal Relational Query Languages • Relational Algebra: More operational, very useful for representing execution plans. • Operators working on relations
Core Relational Algebra (Recap) • Union, intersection, and difference. • Usual set operations, both operands have the same relation schema. • Selection: picking certain rows. • Projection: picking certain columns. • Productsand joins: compositions of relations. • Renamingof relations and attributes. • Groupingand Aggregation: Grouping matching tuples • Duplicate Elimination: eliminates identical copies except one • Sorting: Orders tuples based on a given criteria
Relational Algebra Express Query Plans B,D sR.A=“c” S.E=2 R.C=S.C X R S
Algebra Behind the Query Language Relational Algebra • Set of operators that operate on relations • Operator semantics based on Setor Bagtheory • Relational algebra form underlying basis (and optimization rules) for SQL SELECT pNumber, count(*) AS CNT FROM Student WHERE sNumber > 1 GROUP BY pNumber;
Relational Algebra • Basic operators • Set Operations (Union: ∪, Intersection: ∩ ,difference: – ) • Select: σ • Project: π • Cartesian product: x • rename: ρ • More advanced operators, e.g., grouping and joins • The operators take one or two relations as inputs and produce a new relation as an output • One input unary operator, two inputs binary operator
Union over sets: Binary Op. • Consider two relations R and S that are union-compatible (same schema) R S R S
Difference over sets: – Binary Op. • R – S are the tuples that appear in R and not in S • R & S must be union-compatible • Defined as: • R – S = {t | t ∈R and t∈ S} R-S ≠ S-R R S R – S
Intersection over sets: ∩ Binary Op. • Consider two Relations R and S that are union-compatible R S R∩S
Selection: σ Unary Op. • Select: σc (R): • c is a condition on R’s attributes • Select subset of tuples from R that satisfy selection condition c σ(C ≥ 6) (R) R
Selection: Example σ ((A=B) ^ (D>5))(R) R σ (D > C)(R)
Project: π Unary Op. • πA1, A2, …, An (R), with A1, A2, …, An attributes AR • returns all tuples in R, but only columns A1, A2, …, An • A1, A2, …, An are called Projection List πA, C (R) R
Extended Projection: πL (R)Example Compute this expression and call it X Rename column A to V π C, VA, X C*3+B(R) R
Cross Product (Cartesian Product): X Binary Op. • Each tuple in R joined with each tuple is S • R x S = {t q | t ∈ R and q ∈ S} R X S S R
Natural Join: R ⋈ S Binary Op. • An implicit equality condition on the common columns Implicit condition (R.B = S.B and R.D = S.D) S R R ⋈ S
Theta Join: R ⋈C S Binary Op. • A join based on any arbitrary condition C • It is defined as : R ⋈C S = (σC (R X S)) Recommendation: Always use Theta join (more explicit and more clear) R S R ⋈R.A>=S.CS
Duplicate Elimination: (R) Unary Op. • Delete all duplicate records • Convert a Bag(allows duplicates) to a Set(does not allow duplicates) R (R)
Grouping & Aggregation operator: Unary Op. • Grouing & Aggregate operation in relational algebra g1,g2, …gm,F1(A1), F2(A2), …Fn(An) (R) Aggregation functions applied over each group Group by these columns (can be empty) avg: average value min: minimum value max: maximum value sum: sum of values count: number of values
Grouping & Aggregation Operator: Example S R • branch_name,sum(balance)(S) • sum(c)(R)
Assignment Operator: • Write query as a sequence of line consisting of: • Series of assignments • Result expression containing the final answer • May use a variable multiple times in subsequent expressions • Example: • R1 (σ ((A=B) ^ (D>5))(R – S)) ∩ W • R2 R1 ⋈(R.A = T.C)T • Result R1 U R2
Banking Example • branch (branch_name, branch_city, assets) • customer (customer_name, customer_street, customer_city) • account (account_number, branch_name, balance) • loan (loan_number, branch_name, amount) • depositor (customer_name, account_number) • borrower (customer_name, loan_number)
Example Queries Find customer names having account balance below 100 or above 10,000 • πcustomer_name (depositor ⋈ • πaccount_number(σbalance <100 OR balance > 10,000(account)))
Example Queries Find customers’ names who have neither accounts nor loans • πcustomer_name(customer) - • (πcustomer_name(borrower) U πcustomer_name(depositer))
Example Queries For branches that gave loans > 100,000 or hold accounts with balances >50,000, report the branch name along whether it is reported because of a loan or an account • R1 πbranch_name, ‘Loan’ As Type (σamount >100,000 (loan)) • R2 πbranch_name, ‘Account’ As Type(σbalance > 50,000 (account))) • Result R1 U R2
Example Queries Find customer names having loans with sum > 20,000 • πcustomer_name (σsum > 20,000 (customer_name, sum sum(amount)(loan ⋈ borrower)))
Example Queries Find the branch name with the largest number of accounts • R1 branch_name, countAccounts count(account_number)(account) • R2 Max max(countAccounts)(R1) • Result πbranch_name(R1 ⋈countAccounts = Max R2)
Summary of Relational-Algebra Operators • Set operators • Union, Intersection, Difference • Selection & Projection & Extended Projection • Joins • Natural, Theta, Outer join • Rename & Assignment • Duplicate elimination • Grouping & Aggregation