1 / 36

Maximizing SQL Aggregate Operators for Complex Queries

Learn about application of aggregate operators and GROUP BY, HAVING clauses in SQL to handle complex queries efficiently and effectively.

Download Presentation

Maximizing SQL Aggregate Operators for Complex Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SQL: Structured Query Language(‘Sequel’) Chapter 5 Part 2.

  2. Running Example R1 • Instances of the Sailors and Reserves relations in our examples. S1 S2

  3. Aggregation and Having Clauses

  4. Aggregate Operators • Significant extension of relational algebra. COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) Why no Distinct?

  5. COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) Aggregate Operators SELECT COUNT (*) FROM Sailors S SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’

  6. Find name and age of the oldest sailor(s) SELECT S.sname, MAX (S.age) FROM Sailors S • What does this query do ? Is this query legal? • No! Why not ? SELECT S.sname, S.age FROM Sailors S WHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2) • What does this query do ? Is this query legal?

  7. Find name and age of the oldest sailor(s) SELECT S.sname, S.age FROM Sailors S WHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2) SELECT S.sname, S.age FROM Sailors S WHERE (SELECT MAX (S2.age) FROM Sailors S2) = S.age • example queries equivalent in SQL/92 • but third NOT supported in some systems

  8. Motivation for Grouping • So far, aggregate operators to all (qualifying) tuples. • Question: • What if want to apply aggregate to each group of tuples ? • Example : • Find the age of the youngest sailor for each rating level. • Example Procedure : • Suppose rating values  {1,2,…,10}, 10 queries: SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i For i = 1, 2, ... , 10:

  9. Motivation for Grouping • What are the problems with above ? • We may not know how many rating levels exist. • Nor what the rating values for these levels are. • Performance issue (why ?) SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i For i = 1, 2, ... , 10:

  10. Add Group By to SQL

  11. Queries With GROUP BY and HAVING SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification GROUP BYgrouping-list HAVING group-qualification • A group is a set of tuples that each have the same value for all attributes in grouping-list. • HAVING clause is a restriction on each group.

  12. Are Group By queries valid or not ? SELECT avg ( S.salary) FROM Sailors S GROUP BY S.rating SELECT S.name FROM Sailors S GROUP BY S.rating

  13. Queries With GROUP BY and HAVING SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification GROUP BYgrouping-list HAVING group-qualification • A group is a set of tuples that each have the same value for all attributes in grouping-list. • target-listcontains :(i) attribute names (ii) aggregate-op (column-name) (e.g., MIN (S.age)). • REQUIREMENT: - target-list (i)  grouping-list. - Why? - Each answer tuple of a group must have single value.

  14. Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors that are over 18. Is Query Valid ? SELECTS.rating, MIN (S.age) AS min-age FROM Sailors S WHERES.age >= 18 GROUP BY S.rating HAVINGCOUNT (*) > 1 What does query below mean ?

  15. GroupBy --- Conceptual Evaluation • Compute the cross-product of relation-list (From) • Discard tuples that fail qualification (Where) • Delete `unnecessary’ fields • Partition the remaining tuples into groups by the value of attributes in grouping-list. (GroupBy) • Eliminate groups using the group-qualification (Having) • Apply selection to each group to produce output tuple (Select) We want to have a single value per group, that is, one answer tuple is generated per qualifying group.

  16. Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors Sailors instance: SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVINGCOUNT (*) > 1

  17. Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors.

  18. Again: Find age of youngest sailor with age 18, for each rating with at least 2 such sailors SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVINGCOUNT (*) > 1 Now: Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors and with every sailor under 60. • Options: • Put 60 age condition into WHERE clause ? • Put 60 age condition into HAVING clause ?

  19. Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors and with every sailor under 60. HAVING COUNT (*) > 1 AND EVERY (S.age <=60) EVERY : Must hold for all tuples in the group.

  20. Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60. Sailors instance: SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 ??? GROUP BY S.rating HAVINGCOUNT (*) > 1 ??? Now check age<=60 before making groups.

  21. Answer relation: Check age<=60 before making groups. Find age of the youngest sailor with age 18, for each rating with at least 2 sailors between 18 and 60. Sailors instance: SELECT S.rating, MIN (S.age) AS min-age FROM Sailors S WHERE S.age >= 18 AND S.age <= 60 GROUP BY S.rating HAVINGCOUNT (*) > 1

  22. Join, GroupBy and Nesting?

  23. For each red boat, find the number of reservations for this boat Sailors : sid, name, … Boats : bid, color, … Reserves: sid, bid, day

  24. For each red boat, find the number of reservations for this boat SELECT B.bid, COUNT (*) AS s-count FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid • Grouping over Join of three relations.

  25. For each red boat, find the number of reservations for this boat Q: What if we move B.color=‘red’ from WHERE to HAVING? • Illegal. • Only column in GroupBy can appear in Having clause, unless in aggregate operator of Having over group; • E.g., HAVING count (B.color = ‘red’ ) > 1; SELECT B.bid, COUNT (*) AS s-count FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid GROUP BY B.bid HAVING (B.color=‘red’)

  26. Find age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age) • Hint : HAVING clause can also contain a subquery. SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating)

  27. Find age of the youngest sailor, for each rating with at least 2 sailors of age 18; SELECT S.rating, MIN (S.age) FROM Sailors S GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S2.age > 18 and S.rating=S2.rating)

  28. Find those ratings for which the average age is the minimum over all ratings • Above query has a problem ! What ? • Aggregate operations cannot be nested! SELECT S.rating FROM Sailors S WHERE S.age = (SELECT MIN (AVG (S2.age)) FROM Sailors S2)

  29. Find those ratings for which the average age is the minimum over all ratings • Correct solution (in SQL/92): SELECT Temp.rating, Temp.avg-age FROM (SELECT S.rating, AVG (S.age) AS avg-age FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avg-age = (SELECT MIN (Temp.avg-age) FROM Temp)

  30. Outer Joins : Special Operators • Left Outer Join • Right Outer Join • Full Outer Join SELECT S.sid, R.bid FROM Sailors S LEFT OUTER JOIN Reserves R WHERE S.sid = R.sid Sailors rows (left) without a matching Reserves row (right) appear in result, but not vice versa. SELECT S.sid, R.bid FROM Sailors S NATURAL LEFT OUTER JOIN Reserves R

  31. Null Values

  32. Null Values • Field values in a tuple are sometimes : • Unknown (e.g., a rating has not been assigned) or • Inapplicable(e.g., no maiden-name when a male person) • SQL provides a special value null for such situations. • The presence of null complicates many issues.

  33. Allowing Null Values • SQL special operators IS NULL or IS NOT NULL to check if value is/is not null. • Disallow NULL value : rating INTEGER NOT NULL • We need a 3-valued logic : condition can be true, false or unknown.

  34. Working with NULL values • Question : • Given predicate (S.rating = 8). Is it TRUE or FALSE ?? • What if rating has null value in tuple ? • Comparison operators on NULL return UNKNOWN • Question : • Given arithmetic expression (S.rating + 8). What is its value? • What if that tuple’s rating has a null value? • Arithmetic operations on NULL return NULL.

  35. Truth table with UNKNOWN In general, a WHERE clause is satisfied only when it evaluates to TRUE. UNKNOWN AND TRUE = UNKNOWN UNKNOWN OR TRUE = TRUE UNKNOWN AND FALSE = FALSEUNKNOWN OR FALSE = UNKNOWN UNKNOWN AND UNKNOWN = UNKNOWN UNKNOWN OR UNKNOWN = UNKNOWN NOT UNKNOWN = UNKNOWN

  36. Summary • SQL was important factor in early acceptance of relational model : easy-to-understand ! • Relationally complete: even more expressive power than relational algebra. • Many alternative ways to write a query. So optimizer must look for most efficient evaluation plan. • In practice, users may (still) want to be aware of how queries are optimized and evaluated for best results.

More Related