360 likes | 405 Views
Learn about application of aggregate operators and GROUP BY, HAVING clauses in SQL to handle complex queries efficiently and effectively.
E N D
SQL: Structured Query Language(‘Sequel’) Chapter 5 Part 2.
Running Example R1 • Instances of the Sailors and Reserves relations in our examples. S1 S2
Aggregate Operators • Significant extension of relational algebra. COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) Why no Distinct?
COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) Aggregate Operators SELECT COUNT (*) FROM Sailors S SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’
Find name and age of the oldest sailor(s) SELECT S.sname, MAX (S.age) FROM Sailors S • What does this query do ? Is this query legal? • No! Why not ? SELECT S.sname, S.age FROM Sailors S WHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2) • What does this query do ? Is this query legal?
Find name and age of the oldest sailor(s) SELECT S.sname, S.age FROM Sailors S WHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2) SELECT S.sname, S.age FROM Sailors S WHERE (SELECT MAX (S2.age) FROM Sailors S2) = S.age • example queries equivalent in SQL/92 • but third NOT supported in some systems
Motivation for Grouping • So far, aggregate operators to all (qualifying) tuples. • Question: • What if want to apply aggregate to each group of tuples ? • Example : • Find the age of the youngest sailor for each rating level. • Example Procedure : • Suppose rating values {1,2,…,10}, 10 queries: SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i For i = 1, 2, ... , 10:
Motivation for Grouping • What are the problems with above ? • We may not know how many rating levels exist. • Nor what the rating values for these levels are. • Performance issue (why ?) SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i For i = 1, 2, ... , 10:
Queries With GROUP BY and HAVING SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification GROUP BYgrouping-list HAVING group-qualification • A group is a set of tuples that each have the same value for all attributes in grouping-list. • HAVING clause is a restriction on each group.
Are Group By queries valid or not ? SELECT avg ( S.salary) FROM Sailors S GROUP BY S.rating SELECT S.name FROM Sailors S GROUP BY S.rating
Queries With GROUP BY and HAVING SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification GROUP BYgrouping-list HAVING group-qualification • A group is a set of tuples that each have the same value for all attributes in grouping-list. • target-listcontains :(i) attribute names (ii) aggregate-op (column-name) (e.g., MIN (S.age)). • REQUIREMENT: - target-list (i) grouping-list. - Why? - Each answer tuple of a group must have single value.
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors that are over 18. Is Query Valid ? SELECTS.rating, MIN (S.age) AS min-age FROM Sailors S WHERES.age >= 18 GROUP BY S.rating HAVINGCOUNT (*) > 1 What does query below mean ?
GroupBy --- Conceptual Evaluation • Compute the cross-product of relation-list (From) • Discard tuples that fail qualification (Where) • Delete `unnecessary’ fields • Partition the remaining tuples into groups by the value of attributes in grouping-list. (GroupBy) • Eliminate groups using the group-qualification (Having) • Apply selection to each group to produce output tuple (Select) We want to have a single value per group, that is, one answer tuple is generated per qualifying group.
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors Sailors instance: SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVINGCOUNT (*) > 1
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors.
Again: Find age of youngest sailor with age 18, for each rating with at least 2 such sailors SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVINGCOUNT (*) > 1 Now: Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors and with every sailor under 60. • Options: • Put 60 age condition into WHERE clause ? • Put 60 age condition into HAVING clause ?
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors and with every sailor under 60. HAVING COUNT (*) > 1 AND EVERY (S.age <=60) EVERY : Must hold for all tuples in the group.
Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60. Sailors instance: SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 ??? GROUP BY S.rating HAVINGCOUNT (*) > 1 ??? Now check age<=60 before making groups.
Answer relation: Check age<=60 before making groups. Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60. Sailors instance: SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 AND S.age <= 60 GROUP BY S.rating HAVINGCOUNT (*) > 1
For each red boat, find the number of reservations for this boat Sailors : sid, name, … Boats : bid, color, … Reserves: sid, bid, day
For each red boat, find the number of reservations for this boat SELECT B.bid, COUNT (*) AS s-count FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid • Grouping over Join of three relations.
For each red boat, find the number of reservations for this boat Q: What if we move B.color=‘red’ from WHERE to HAVING? • Illegal. • Only column in GroupBy can appear in Having clause, unless in aggregate operator of Having over group; • E.g., HAVING count (B.color = ‘red’ ) > 1; SELECT B.bid, COUNT (*) AS s-count FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid GROUP BY B.bid HAVING (B.color=‘red’)
Find age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age) • Hint : HAVING clause can also contain a subquery. SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating)
Find age of the youngest sailor, for each rating with at least 2 sailors of age 18; SELECT S.rating, MIN (S.age) FROM Sailors S GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S2.age > 18 and S.rating=S2.rating)
Find those ratings for which the average age is the minimum over all ratings • Above query has a problem ! What ? • Aggregate operations cannot be nested! SELECT S.rating FROM Sailors S WHERE S.age = (SELECT MIN (AVG (S2.age)) FROM Sailors S2)
Find those ratings for which the average age is the minimum over all ratings • Correct solution (in SQL/92): SELECT Temp.rating, Temp.avg-age FROM (SELECT S.rating, AVG (S.age) AS avg-age FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avg-age = (SELECT MIN (Temp.avg-age) FROM Temp)
Outer Joins : Special Operators • Left Outer Join • Right Outer Join • Full Outer Join SELECT S.sid, R.bid FROM Sailors S LEFT OUTER JOIN Reserves R WHERE S.sid = R.sid Sailors rows (left) without a matching Reserves row (right) appear in result, but not vice versa. SELECT S.sid, R.bid FROM Sailors S NATURAL LEFT OUTER JOIN Reserves R
Null Values • Field values in a tuple are sometimes : • Unknown (e.g., a rating has not been assigned) or • Inapplicable(e.g., no maiden-name when a male person) • SQL provides a special value null for such situations. • The presence of null complicates many issues.
Allowing Null Values • SQL special operators IS NULL or IS NOT NULL to check if value is/is not null. • Disallow NULL value : rating INTEGER NOT NULL • We need a 3-valued logic : condition can be true, false or unknown.
Working with NULL values • Question : • Given predicate (S.rating = 8). Is it TRUE or FALSE ?? • What if rating has null value in tuple ? • Comparison operators on NULL return UNKNOWN • Question : • Given arithmetic expression (S.rating + 8). What is its value? • What if that tuple’s rating has a null value? • Arithmetic operations on NULL return NULL.
Truth table with UNKNOWN In general, a WHERE clause is satisfied only when it evaluates to TRUE. UNKNOWN AND TRUE = UNKNOWN UNKNOWN OR TRUE = TRUE UNKNOWN AND FALSE = FALSEUNKNOWN OR FALSE = UNKNOWN UNKNOWN AND UNKNOWN = UNKNOWN UNKNOWN OR UNKNOWN = UNKNOWN NOT UNKNOWN = UNKNOWN
Summary • SQL was important factor in early acceptance of relational model : easy-to-understand ! • Relationally complete: even more expressive power than relational algebra. • Many alternative ways to write a query. So optimizer must look for most efficient evaluation plan. • In practice, users may (still) want to be aware of how queries are optimized and evaluated for best results.