520 likes | 759 Views
SQL, RA, Sets, Bags. Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL RA uses sets SQL uses bags ( multisets ) There are good performance reasons for using bags:
E N D
SQL, RA, Sets, Bags • Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL • RA uses sets • SQL uses bags (multisets) • There are good performance reasons for using bags: • Queries involve 2+ join, union, etc., which would require an extra pass through the relation being built • There are times we WANT every instance, particularly for aggregate functions (e.g. taking an average) • Downside: • Extra memory
Section 5.1 Topics include: • Union, Difference, Intersection and how they are affected by operation over bags • Projection operator over bags • Selection operator over bags • Product and join over bags • All the above follow what you would expect • Other topics in 5.1: • Algebraic laws of set operators applied to bags
Examples: set operators over bags • {1,2,1} ∪ {1,1,2,3,1} = • {1,1,1,1,1,2,2,3} • {1,2,1,1} ∩ {1,2,1,3} = • {1, 1, 2} • {1,2,1,1,1} – {1,1,2,3} = • {1,1}
Exercise 5.1.3b • πbore(Ships |><| Classes)
δ – Duplicate elimination • δ(R) • Eliminate duplicates from relation R • (i.e. converts a relation from a bag to set representation) • R2 := δ(R1) • R2 consists of one copy of each tuple that appears in R2 one or more times • DISTINCT modifier in SELECT stmt
δ(R) = A B 1 2 3 4 δ - Example R = ( A B ) 1 2 3 4 1 2
τ – Sorting • R2 := τL(R1) • L – list of some attributes of R1 • L specifies the order of sorting • Increasing order • Tuples with identical components in L specify no order • Benefit: • Obvious – ordered output • Not so obvious – stored sorted relations can have substantial query benefit • Recall running time for binary search • O(logn) is far superior than O(n)
Aggregation Operators • Use to summarize something about the values in attribute of a relation • Produces a single value as a result • SUM(attr) • AVG(attr) • MIN(attr) • MAX(attr) • COUNT(attr)
Example: Aggregation R = ( A B ) 1 3 3 4 3 2 SUM(A), COUNT(A), MAX(B), AVG(B) = ? SUM(A) = 7 COUNT(A) = 3 MAX(B) = 4 AVG(B) = 3
Grouping Operator • R2 := γL(R1) • L is a list of elements that are: • Individual attributes of R1 • Called grouping attributes • Aggregated attribute of R1 • Use an arrow and a new name to rename the component • R2 projects only what is in L
How does γL(R) work? • Form one group for each distinct list of values for those attributes in R • Within each group, compute AGG(A) for each aggregation on L • Result has one tuple for each group • The grouping attributes' values for the group • The aggregations over all tuples of the group (for the aggregated attributes)
Then, average C within groups: A B X 1 2 4 4 56 1 3 5 First, partition R by A and B : A B C 1 2 3 1 2 54 5 6 1 3 5 Example: Grouping / Aggregation R = ( A B C ) 1 2 3 4 5 6 1 2 5 1 3 5 γA,B,AVG(C)->X (R) = ??
Note about aggregation • If R is a relation, and R has attributes A1…An, then • δ(R) == γA1,A2,…,An(R) • Grouping on ALL attributes in R eliminates duplicates • i.e. δ is not really necessary • Also, if relation R is also a set, then • πA1,A2,…,An(R) = γA1,A2,…,An(R)
Extended Projection • Recall R2 := πL(R1) • R2 contains only L attributes from R1 • L can be extended to allow arbitrary expressions: • Renaming (e.g., A -> B) • Arithmetic expressions (e.g., A + B -> SUM) • Duplicate attributes (i.e., include in L multiple times)
πA+B->C,A,A (R) = C A1 A2 3 1 1 7 3 3 Example: Extended Projection R = ( A B ) 1 2 3 4
Outer joins • Recall that thestandard natural join occurs only if there is a match from both relations • A tuple of R that has NO tuple of S with which it can join is said to be dangling • Vice versa applies • Outer join: preserves dangling tuples in join • Missing components set to NULL • R|>◦<|CS. • This is a bad approximation of the symbol – see text • NO C? Natural outer join
R |>◦<| S = A B C 1 2 3 4 5 NULL NULL 6 7 Example: Outer Join R = ( A B ) S =( B C ) 1 2 2 3 4 5 6 7 (1,2) joins with (2,3), but the other two tuples are dangling.
Types of outer joins • R |>◦<| S • No condition, requires matching attributes • Pads dangling tuples from both side • R |>◦<| LS • Pad dangling tupes of R only • R |>◦<| RS • Pad dangling tuples of S only • SQL: • R NATURAL {LEFT | RIGHT} JOIN S • R {LEFT | RIGHT} JOIN S • NOTE MySQL does not allow a FULL OUTER JOIN! Only LEFT or RIGHT • Just UNION a left outer join and a right outer join… mostly
A+B A2 B2 1 0 1 5 4 9 1 0 1 6 4 16 7 9 16 B+1 C-1 1 0 3 3 3 4 4 3 1 1 4 3
A B 0 1 2 3 2 4 3 4 A SUM(B) 0 2 2 7 3 4 SELECT A,SUM(B) FROM R GROUP BY A
A 0 2 3 SELECT A FROM R GROUP BY A; SELECT DISTINCT A FROM R;
What if MAX(C) was SUM(C)? A MAX(C) 2 4 SELECT A,MAX(C) FROM R NATURAL JOIN SGROUP BY A;
A B C 2 3 4 2 3 4 0 1 ┴ 0 1 ┴ 2 4 ┴ 3 4 ┴ SELECT * FROM R NATURAL LEFT JOIN S;
A B C 2 3 4 2 3 4 ┴ 0 1 ┴ 2 4 ┴ 2 5 ┴ 0 2 SELECT * FROM R NATURAL RIGHT JOIN S;
SELECT * FROM R NATURAL LEFT JOIN SUNIONSELECT * FROM R NATURAL RIGHT JOIN S; A B C 2 3 4 2 3 4 0 1 ┴ 0 1 ┴ 2 4 ┴ 3 4 ┴ ┴ 0 1 ┴ 2 4 ┴ 2 5 ┴ 0 2 Right?
SELECT * FROM R NATURAL LEFT JOIN SUNION ALLSELECT * FROM R NATURAL RIGHT JOIN SWHERE A IS NULL;
A R.B S.B C 0 1 2 4 0 1 2 5 0 1 3 4 0 1 3 4 0 1 2 4 0 1 2 5 0 1 3 4 0 1 3 4 2 3 ┴ ┴ 2 4 ┴ ┴ 3 4 ┴ ┴ ┴ ┴ 0 1 ┴ ┴ 0 2
Aggregations • SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause • Produces an aggregation on the attribute • COUNT(*) count the number of tuples • Use DISTINCT inside of an aggregation to eliminate duplicates in the function
Example: • Sells(bar, beer, price) • Find the average price of Guinness • SELECT AVG(price) • FROM Sells • WHERE beer = 'Guinness'; • Find the number of different prices charged for Guinness • SELECT COUNT(DISTINCT price) AS "# Prices" • FROM Sells • WHERE beer = 'Guinness';
Grouping • SELECT attr(s)FROM tblsWHERE cond_exprGROUP BY attr(s) • The resulting SELECT-FROM-WHERE relation determined FIRST, then grouped according to GROUP BY clause • MySQL will also sort the relations according to attributes listed in GROUP BY clause • Therefore, allows optional ASC or DESC (just like ORDER BY) • Aggregations are applied only within each group
Note on NULL and Aggregation • NULL values in a tuple: • never contribute to a sum, average or count • can never be a min or max of an attribute • If all values for an attribute are NULL, then the result of an aggregation is NULL • Exception: COUNT of an empty set is 0 • NULL values are treated as ordinary values when forming groups
Example: Grouping • Sells(bar, beer, price)Frequents(drinker, bar) • Find the average price for each beer • SELECT beer, AVG(price) • FROM Sells • GROUP BY beer; • Find for each drinker the average price of Guinness at the bars they frequent • SELECT drinker, AVG(price) • FROM Frequents • NATURAL JOIN Sells • WHERE beer = 'Guinness' • GROUP BY drinker;
Restrictions • Example: • Find the bar that sells Guinness the cheapest • SELECT bar, MIN(price)FROM SellsWHERE beer = 'Guinness'; • Is this correct? • Book states that this is illegal SQL • if an aggregation used, then each SELECT element should be aggregated or be an attribute in GROUP BY • MySQL allows the above, but such queries will give meaningless results
Example of confusing aggregation • Find the country of the ship with bore of 15 with the smallest displacement • SELECT country, MIN(displacement)FROM ClassesWHERE bore = 15;
Not quite the correct answer! Be sure to follow the rules for aggregation.
HAVING Clause • HAVING cond • Follows a GROUP BY clause • Condition applies to each possible group • Groups not satisfying condition are eliminated • Rules for conditions in HAVING clause: • Aggregated attributes: • Any attribute in relation in FROM clause can be aggregated • Only applies to the group being tested • Unaggregated attributes • Only attributes in GROUP BY list • mySQL is more lenient with this, though they result in meaningless information
Example: HAVING • Sells(bar, beer, price) • Find the average price of those beers that are served in at least three bars • SELECT beer, AVG(price)FROM SellsGROUP BY beerHAVING COUNT(*) >= 3;
Example: HAVING • Sells(bar, beer, price)Beers(name, manf) • Find the average price of beers that are either served in at least three bars or are manufactured by Sam Adams • SELECT beer, AVG(price) • FROM Sells • GROUP BY beer • HAVING COUNT(*) >= 3 OR • beer IN • (SELECT name FROM Beers WHERE manf = 'Sam Adams');
Find the average displacement of ships from each country having at least two classes • SELECT country, AVG(displacement) • FROM Classes • GROUP BY country • HAVING count(*) >= 2;
Summary so far • SELECT S • FROM R1,…,Rn • WHERE C1 • GROUP BY a1,…,ak • HAVING C2 • ORDER BY b1,…,bk; • S attributes from R1,…,Rn or aggregates • C1 are conditions on R1,…,Rn • a1,…,ak are attributes from R1,…,Rn • C2 are conditions based on any attribute, or on any aggregation in GROUP BY clause • b1,…,bk are attributes on R1,…,Rn
Exercise 6.2.3f SELECT battle FROM Outcomes INNER JOIN Ships ON Outcomes.ship = Ships.name NATURAL JOIN Classes GROUP BY country, battle HAVING COUNT(ship) >= 3;
Exercise 6.4.7a • SELECT COUNT(type)FROM ClassesWHERE type = 'bb';
Exercise 6.4.7b • SELECT AVG(numGuns) AS 'Avg Guns'FROM ClassesWHERE type = 'bb';
Exercise 6.4.7c • SELECT AVG(numGuns) AS 'Avg Guns'FROM Classes NATURAL JOIN ShipsWHERE type = 'bb';