260 likes | 384 Views
Math in SQL. Aggregation Operators. Operators on sets of tuples. Significant extension of relational algebra. SUM ( [DISTINCT] A) : the sum of all (unique) values in attribute A. AVG ( [DISTINCT] A): the average of all (unique) values in attribute A . SELECT AVG ( S.age )
E N D
Aggregation Operators • Operators on sets of tuples. • Significant extension of relational algebra. • SUM ( [DISTINCT] A): the sum of all (unique) values in attribute A. • AVG ( [DISTINCT] A): the average of all (unique) values in attribute A. SELECT AVG (S.age) FROM Sailors S; SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERES.rating=10;
Aggregation Operators • Operators on sets of tuples. • Significant extension of relational algebra. • MAX (A): the maximum value in attribute A. • MIN (A): the minimum value in attribute A. SELECT MAX(rating) FROM Sailors; SELECTS.sname FROM Sailors S WHERES.rating= (SELECT MAX(S2.rating) FROM Sailors S2);
Aggregation Operators • Operators on sets of tuples. • Significant extension of relational algebra. • COUNT (*): the number of tuples. SELECT COUNT (*) FROM Sailors S
Aggregation Operators • Operators on sets of tuples. • Significant extension of relational algebra. • COUNT ( [DISTINCT] A): the number of (unique) values in attribute A. SELECT COUNT (DISTINCTS.rating) FROM Sailors S WHERE S.sname=‘Bob’;
Aggregation Operators • Find name and age of the oldest sailor(s). • The first query looks correct, but is illegal. • Thoughts as to why? • The second query is a correct and legal solution. SELECTS.sname, MAX (S.age) FROM Sailors S; SELECTS.sname, S.age FROM Sailors S WHERES.age = (SELECT MAX (S2.age) FROM Sailors S2);
GROUP BY and HAVING • So far, we’ve applied aggregation operators to all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples. • Find the age of the youngest sailor for each rating value. • Suppose we know that rating values go from 1 to 10; we can write ten (!) queries that look like this: • But in general, we don’t know how many rating values exist, and what these rating values are. • Plus, it’s a waste of time to write so many queries For i = 1, 2, ... , 10: SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i;
GROUP BY and HAVING SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification GROUP BYgrouping-list HAVING group-qualification • A group is a set of tuples that have the same value for all attributes grouping-list. • The target-list contains • attribute names • terms with aggregation operations. • Attribute list must be a subset of grouping-list. • Each answer tuple corresponds to a group, and output attributes must have a single value per group. Notice the notation
Conceptual Evaluation • Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1 • Step 1 • The cross-product of relation-list is computed • In this instance, it’s only Sailors
Conceptual Evaluation • Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1 • Step 2 • Tuples that fail qualification are discarded • ‘unnecessary’ attributes are deleted
Conceptual Evaluation • Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1 • Step 3 • Remaining tuples are partitioned intogroups by the value of attributes ingrouping-list
Conceptual Evaluation • Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1 • Step 4 • The group-qualification is then applied toeliminate groups that do not satisfy thiscondition.
Conceptual Evaluation • Given: SELECT S.rating, MIN(S.age) as minage FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT(*) > 1 • Step 5 • One answer tuple is generated per qualifying group by applying the aggregation operator.
GROUP BY and HAVING • Find the age of the youngest sailor with age 18, for each rating with at least 2 such sailors. SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1; • Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes `unnecessary’. • 2nd column of result is unnamed • What to do? Answer relation
GROUP BY and HAVING • For each red boat, find the number of reservations for this boat. SELECT B.bid, COUNT (*) AS scount FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid; • Grouping over a join of three relations. • What do we get if we remove B.color=‘red’ from the WHERE clause and add a HAVING clause with this condition? • What if we drop Sailors and the condition involving S.sid?
GROUP BY and HAVING • Find the age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age). SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating); • Shows HAVING clause can also contain a subquery. • What if HAVING clause is replaced by: • HAVING COUNT(*) >1
GROUP BY and HAVING • Find those ratings for which the average age is the minimum over all ratings. • Aggregation operations cannot be nested! • WRONG: SELECT S.rating FROM Sailors S WHERE S.age = (SELECT MIN (AVG (S2.age)) FROM Sailors S2); • Correct solution: SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp);
ORDER BY • The ORDER BY keyword is used to sort the result-set by a specified column. • The ORDER BY keyword sort the records in ascending order by default. • If you want to sort the records in a descending order, you can use the DESC keyword.
TOP/BOTTOM • The TOP clause is used to specify the number of records to return. • The TOP clause can be very useful on large tables with thousands of records • Returning a large number of records can impact on performance • Can ‘sample’ the table using TOP • Not all database systems support the TOP clause or implement it in different fashion
TOP/BOTTOM SQL Server SELECT TOP number|percentcolumn_name(s)FROM table_name Ex: SELECT TOP 5 * FROM Persons MySQL SELECT column_name(s) FROM table_name LIMIT number Ex: SELECT * FROM Persons LIMIT 5
TOP/BOTTOM Oracle SELECT column_name(s) FROM table_name WHERE ROWNUM <= number Ex: SELECT * FROM Persons WHERE ROWNUM <=5 DB2 SELECT column_name(s) FROM table_name FETCH FIRST number ROWS ONLY Ex: SELECT * FROM Persons FETCH FIRST 5 ROWS ONLY
TOP/BOTTOM • Can specify • Fixed number • SELECT TOP 10 * … • A percent • SELECT TOP 10 PERCENT * …
TOP/BOTTOM • How to return the oldest 5 rentals? • How to return the newest 5 rentals?
TOP/BOTTOM • How to return the 3rd newest rental?
Summary • SQL was an important factor in the early acceptance of the relational model; more natural than earlier, procedural query languages. • All queries that can be expressed in relational algebra can also be formulated in SQL. • In addition, SQL has significantly more expressive power than relational algebra, in particular aggregation operations and grouping. • Many alternative ways to write a query; query optimizer looks for most efficient evaluation plan. • In practice, users need to be aware of how queries are optimized and evaluated for most efficient results.