1 / 22

SQL Part II: Advanced Queries

SQL Part II: Advanced Queries. Aggregation. Significant extension of relational algebra “ Count the number of tuples in Sailors” SELECT count(*) FROM Sailors “ What is the average age of sailors with rating 10?” SELECT avg(age) FROM Sailors WHERE rating = 10

oatesm
Download Presentation

SQL Part II: Advanced Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SQLPart II: Advanced Queries

  2. Aggregation • Significant extension of relational algebra • “Count the number of tuples in Sailors” SELECT count(*) FROM Sailors • “What is the average age of sailors with rating 10?” SELECT avg(age) FROM Sailors WHERE rating = 10 • “How many different boats have been reserved?” • SELECT count(DISTINCT bid) FROM Reserves • Syntax: COUNT, SUM, AVG, MAX, MIN apply to single attribute/column. Additionally, COUNT(*) • Result is a relation with only one tuple Count(*) 3 avg(age) 35

  3. Aggregation (contd). • “Give the names of the sailors with the highest rankings” SELECT sname FROM Sailors S1 WHERE S1.rating = (SELECT MAX(S2.rating) FROM Sailors S2) (Note also the = in the where clause. We can use = when it is assured that the relation resulting from the subquery has only one tuple.) • “Give the name of the sailor that is the first in the alphabet” SELECT min(sname) FROM Sailors S1 sid min(sname) age sname rating 22 debby 7 45 31 debby 8 55 58 lilly 10 35 debby

  4. So far, we have applied aggregate operators to all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples. Example: “Find the average age of the sailors in each rating level” In general, we don’t know how many rating levels exist, and what the rating values for these levels are. Suppose, we know that the rating levels go from 1 to 10; then we can write 10 queries that look like this: SELECT avg(age) for i=1,2,…10 FROM Sailors WHERE rating = i Using Grouping SELECT avg(age) FROM Sailors GROUP BY rating Grouping sid age sname rating avg(age) 22 debby 7 45 31 debby 7 55 58 lilly 10 35 50 35

  5. Queries with GROUP BY SELECT target-list FROM relation list WHERE qualification GROUP BY grouping list • A group is defined as a set of tuples that have the same value for all attributes in the grouping list • One answer tuple is generated per group. • The target-list contains attributes and/or aggregation terms • The attributes of the target-list must be a subset of the grouping list: Since each answer tuple corresponds to one group, we can only depict attributes, for which all tuples in the group have the same value • Example: SELECT rating, avg(age) FROM Sailors GROUP BY rating sid age sname rating rating, avg(age) 22 debby 7 45 31 debby 7 55 58 lilly 10 35 7 50 10 35

  6. Evaluation SELECT target-list FROM relation list WHERE qualification GROUP BY grouping list • Conversion to Relational Algebra • Compute the cross-product of relations in FROM clause, consider only tuples that fulfill the qualification in WHERE clause, project on fields that are needed (in SELECT or GROUP BY) • Partition the remaining tuples into groups by the value of attributes in grouping-list • Return all attributes in the SELECT clause (must also be in the group list) plus the calculated aggregation terms per group.

  7. SELECT lists with aggregation • If any aggregation is used, then each element in the attribute list of the SELECT clause must either be aggregated or appear in a group-by clause SELECT rating, avg(age) FROM Sailors GROUP BY rating • Look at a wrong (left) and correct (right) way to find the name of the oldest sailor SELECT sname, MAX(age) SELECT S1.sname, S1.age FROM Sailors FROM Sailors S1 WHERE S1.age = (SELECT MAX(S2.age) FROM Sailors S2) sid age sname rating rating avg(age) 22 debby 7 45 31 debby 7 55 58 lilly 10 35 7 50 10 35

  8. HAVING clauses are selections on groups, just as WHERE clauses are selections on tuples Example: “For each rating level, find the average age of all sailors over 18; only consider rating levels for which there exist at least two sailors over 18” SELECT rating, avg(age) FROM Sailors WHERE age > 18 GROUP BY rating HAVING COUNT(*) >= 2 HAVING CLAUSE

  9. Example age sid sname rating SELECT rating, avg(age) FROM Sailors WHERE age > 18 GROUP BY rating HAVING COUNT(*) >= 2 • Select upon WHERE and project to necessary attributes • Partition by GROUP and check whether they fulfill HAVING • Second column of result is unnamed 1 A 9 20 B 1 30 C 6 22 D 9 15 E 1 10 F 6 26 G 8 15 age rating rating 9 20 1 30 6 22 6 26 6 24 Answer Relation

  10. Evaluation SELECT rating, avg(age) SELECT target-list FROM Sailors FROM relation list WHERE age > 18 WHERE qualification GROUP BY rating GROUP BY grouping list HAVING COUNT(*) >= 2 HAVING group-qualification • Conversion to Relational Algebra • Compute the cross-product of relations in FROM clause, consider only tuples that fulfill the qualification in WHERE clause, project on fields that are needed (in SELECT or GROUP BY) • Partition the remaining tuples into groups by the value of attributes in grouping-list • For each group, the group qualification is then applied selecting only those groups that fulfill the qualification. Expressions in group-qualification must have a single value per group. Hence, for each attribute in the group qualification, either • the attribute also appears in the grouping list • or it is argument of an aggregation

  11. Example II • For each red boat, find the number of reservations for this boat SELECT B.bid, COUNT (*) AS scount FROM Boats B, Reserves R WHERE R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid • Grouping over a join of two relations.

  12. Example III • Find the age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age) SELECT S1.rating, MIN (S1.age) FROM Sailors S1 WHERE S1.age > 18 GROUP BY S1.rating HAVING 1 < (SELECT COUNT(*) FROM Sailors S2 WHERE S1.rating=S2.rating) • Shows HAVING clause can also contain a subquery. • Compare this with the query where we considered only ratings with 2 sailors over 18! • What if HAVING clause is replaced by: • HAVING COUNT(*) >1

  13. Example IV • Find those ratings for which the average age is the minimum over all ratings • Aggregate operations cannot be nested! WRONG SELECT S.rating FROM Sailors S WHERE S.age =(SELECT MIN (AVG (S2.age)) FROM Sailors S2) • Correct solution (in SQL2): SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp) • Or use view as intermediate relation

  14. NULL Values • Meaning of a NULL value • Unknown/missing • Inapplicable (e.g., no spouse’s name) • Comparing NULLs to values • E.g., how to evaluate condition rating>7 if tuple has a NULL in rating? • When we compare a NULL value and any other value (including NULL) using a comparison operator like > or =, the result is “unknown”. • If we want to check whether a value is NULL, SQL provides the special comparison operator IS NULL • Arithmetic Operations (*, +, etc): • When at least one operand has a NULL value (the other operands can have any value including NULL) then the result is NULL (consequence 0*NULL=NULL !) • We cannot use NULL as an operand (e.g., rating < NULL).

  15. NULL Values (contd.) • 3-valued logic necessary: true, false, unknown • NOT unknown = unknown • A OR B = true if either A=true or B=true • A OR B = false if A=false and B=false • A OR B = unknown if (A=false and B=unknown) or (A=unknown and B=false) or (A=unknown and B=unknown) • A AND B = true if A=true and B=true • A AND B = false if either A=false or B=false • A AND B = unknown if (A=true and B=unknown) or (A=unknown and B=true) or (A=unknown and B=unknown)

  16. Query evaluation considering NULL values • Evaluation in SQL • The qualification in the WHERE clause eliminates rows for which the qualification does not evaluate true (i.e., rows that evaluate to false or unknown are eliminated) • SQL defines that rows are duplicates if corresponding columns are either equal or both contain NULL (in contrast to the usual on previous slide where the comparison of the NULLs results in unknown) • COUNT(*) handles NULLs like other values, I.e., they are counted • All other aggregate operations simply discard NULL values

  17. Outer Join • R1 C R2 = R1 C R2 with dangling tuples padded with nulls and included in the result • Example: • The result of Sailors Reserves does only contain tuples refering to sailors who have reserved at least one boat (at least one tuple of this sailor in Reserves). • The result of Sailors Reserves contains for each sailor without matching Reserves tuple exactly one row, with the result columns inherited from Reserves assigned NULL values SELECT S.sid, S.sname, R.bid FROM Sailors S NATURAL LEFT OUTER JOIN Reserves S sid sname bid 22 debby 103 58 lilly NULL

  18. Views • A view is just a unmaterialized relation: we store a definition rather than a set of tuples. CREATE VIEW ActiveSailors (sid,name) AS SELECT DISTINCT S.sid, S.name FROM Sailors S, Reserved R WHERE S.sid = R.sid • Views can be used to present necessary information (or a summary), while hiding details in underlying relation(s). • Given ActiveSailors, we know the names of the sailors who have reserved boats (good for accounting), but not the age of the sailors (uninteresting for accounting).

  19. Views (contd) • Views can be treated as if they were materialized relations • The system translates a SELECT on a view into SELECTS on the materialized relations • Modifications are problematic • Views can be dropped using the DROP VIEWcommand • How to handle DROP TABLE if there’s a view on the table? • DROP TABLE command has options to let the user specifiy this.

  20. Levels of Abstraction • Single conceptual (logical) schema defines logical structure • Conceptual database design • Physical schema describes the files and indexes used • Physical database design • Different views describe how users see the data (also referred to as external schema) • generated on demand from the real data • Physical data independence: the conceptual schema protects from changes in the physical structure of data • Logical data independence: external schema protects from changes in conceptual schema of data View 1 View 2 Conceptual Schema Physical Schema

  21. Limits of Queries in SQL • SQL is not TURING complete • Examples • Calculate the variance of the ages of the sailors • Build the following categories within the Sailors: all sailors younger than 20, all sailors between 20 and 29, etc. For each category, give the total number of sailors and their average rating. • Assume a relation with all direct flights. Determine all cities that are reachable from Montreal • Flights(fid, dep-city, arr-city) SELECT Fn.arr-city FROM Flights F1, Flights F2, … Flights Fn WHERE F1.dep-city = ‘Montreal’ AND F1.arr-city = F2.dep-city AND … Fn-1.arr-city = Fn.dep-city fid Dep-city Arr-city Montreal Zurich 25 Zurich Munich Munich Bombay …

  22. DB Modifications: insert/delete/update • Insert values for all attributes in the order attributes were declared or values for only some attributes • INSERT INTO Sailors VALUES (68,’Jacky’,10, 40) • INSERT INTO Sailors (sid,name) VALUES (68, ‘Jacky’) • Insert the result of a query • ActiveSailors(sid,name) • INSERT INTO ActiveSailors ( SELECT Sailors.sid Sailors.name FROM Sailors, Reserves WHERE Sailors.sid = Reserves.sid) • Delete some or all tuples of a relation • DELETE FROM Boats WHERE bid = 103 / DELETE FROM Boats • Update some of the attributes of some of the tuples • UPDATE Sailors SET ranking = 10, age = age + 1 WHERE name = ‘debby’ OR name = ‘lilly’ • SQL2 semantics: all conditions in a modification statement must be evaluated by the system BEFORE any modifications occur.

More Related