380 likes | 492 Views
The Query Language SQL. What we cannot compute with RA. Aggregate operations . E.g. ``The number of climbers who have climbed `Last Tango' '' or ``The average age of climbers''. These are possible in SQL which has numerous extensions to relational algebra.
E N D
What we cannot compute with RA • Aggregate operations. E.g. ``The number of climbers who have climbed `Last Tango' '' or ``The average age of climbers''. These are possible in SQL which has numerous extensions to relational algebra. • Recursive queries. Given a relation Parent(Parent, Child) compute the Ancestor relation. (Can do this in Datalog.) • Computing with non 1NF relationse.g. lists, arrays, multisets, nested relations.
SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification Basic Query • relation-list A list of relation names (possibly with a range-variable after each name). • target-list A list of attributes of relations in relation-list. * can be used to denote all atts. • qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of combined using AND, OR and NOT. • DISTINCT(optional) keyword indicates that the answer should not contain duplicates. Default is that duplicates are not eliminated!
Conceptual Evaluation Strategy • Compute the product of relation-list • Discard tuples that fail qualification • Project over attributes in target-list • If DISTINCT then eliminate duplicates This is probably a very bad way of executing the query, and a good query optimizer will use all sorts of tricks to find efficient strategies to compute the same answer.
Sample tables Routes: RId RName Grade Rating Height 1 Last Tango II 12 100 2 Garden Path I 2 60 3 The Sluice I 8 60 4 Picnic III 3 400 Climbers: Climbs: Cid CName Skill Age CId RId Date Duration 123 Edmund EXP 80 123 1 10/10/88 5 214 Arnold BEG 25 123 3 11/08/87 1 313 Bridget EXP 33 313 1 12/08/89 5 212 James MED 27 214 2 08/07/92 2 313 1 06/07/94 3
SELECT * FROM Routes WHERE Height < 200; RID RNAME GRADE RATING HEIGHT 1 Last Tango II 12 100 2 Garden Path I 2 60 3 The Sluice I 8 60 GRADE HEIGHT I 100 I 60 I 60 III 400 Select/project queries SELECT Grade, Height FROM Routes;
GRAD HEIGHT I 60 II 100 III 400 Distinct • Note that SQL did not eliminate duplicates. We need to request this explicitly. SELECT DISTINCT Grade, Height FROM Routes;
RId RName Grade Rating Height 1 Last Tango II 12 100 Pattern Matching • Can be used in where clause. “_” denotes any character, “%” 0 or more characters. SELECT * FROM Routes WHERE RName LIKE 'L_%o'
Arithmetic • “as” can be used to label columns in the output; arithmetic can be used to compute results SELECT DISTINCT Grade, Height/10 as H FROM Routes; Grade H II 10 I 6 III 40
CID 123 212 214 313 Set operations -- union SELECT CId FROM Climbers WHERE Age < 40 UNION SELECT CId FROM Climbs WHERE RID = 1 ; • Duplicates do not occur in the union.
CID 214 313 212 123 313 313 The UNION ALL operator preserves duplicates SELECT Cid FROM Climbers WHERE Age < 40 UNION ALL SELECT Cid FROM Climbs WHERE RID = 1 ;
What does “union compatible” mean? SELECT CId FROM Climbers UNION SELECT RId FROM Routes; Ok SELECT CName FROM Climbers UNION SELECT RId FROM Routes; Error
CID 123 CID 212 214 Intersection and difference SELECT CId FROM Climbers WHERE Age > 40 INTERSECT SELECT CId FROM Climbs WHERE RId = 1 ; SELECT CId FROM Climbers WHERE Age < 40 MINUS SELECT CId FROM Climbs WHERE RId = 1 ;
Nested queries • We could also have written the previous queries as follows: SELECT CId FROM Climbers WHERE Age > 40 AND CId IN (SELECT CId FROM Climbs WHERE RId = 1) ; SELECT CId FROM Climbers WHERE Age < 40 AND CId NOT IN (SELECT CId FROM Climbs WHERE RId = 1) ;
Nested queries with correlation SELECT CId FROM Climbers c WHERE EXISTS (SELECT * FROM Climbs b WHERE c.CId=b.CId AND b.RID = 1); SELECT CId FROM Climbers c WHERE NOT EXISTS (SELECT * FROM Climbs b WHERE c.CId=b.CId); SELECT CId FROM Climbers c WHERE EXISTS UNIQUE (SELECT * FROM Climbs b WHERE c.CId=b.CId AND RID = 1);
CName Age Edmund 80 More on set comparison ops • Besides IN, NOT IN, EXISTS, NOT EXISTS, UNIQUE and NOT UNIQUE we can also say: <op> ANY, <op> ALL, where <op> is any of • What does the following mean in English? SELECT CName, Age FROM Climbers WHERE Age >= ALL (SELECT Age FROM Climbers)
Cid CName Skill Age 123 Edmund EXP 80 313 Bridget EXP 33 212 James MED 27 Set comparison ops, cont. • What does the following mean in English? SELECT * FROM Climbers WHERE Age > ANY (SELECT Age FROM Climbers WHERE CName='Arnold')
CNAME James Using expressions for relation names • Consider the following query: “Find the names of climbers who have not climbed any route.” SELECT CName FROM (SELECT CId FROM Climbers MINUS SELECT CId FROM Climbs) Temp, Climbers WHERE Temp.CId = Climbers.CId;
CID CNAME SKILL AGE CID RID DAY DURATION 123 Edmund EXP 80 123 1 10-OCT-88 5 214 Arnold BEG 25 123 1 10-OCT-88 5 313 Bridget EXP 33 123 1 10-OCT-88 5 212 James MED 27 123 1 10-OCT-88 5 123 Edmund EXP 80 123 3 08-NOV-87 1 214 Arnold BEG 25 123 3 08-NOV-87 1 ... Products SELECT * FROM Climbers,Climbs; • Note that the CID column name is duplicated in the output.
Conditional join SELECT * FROM Climbers,Climbs WHERE Climbers.CId = Climbs.CId; CID CNAME SKIL AGE CID RID DAY DURATION 123 Edmund EXP 80 123 1 10-OCT-88 5 123 Edmund EXP 80 123 3 08-NOV-87 1 313 Bridget EXP 33 313 1 08-DEC-89 5 214 Arnold BEG 25 214 2 07-AUG-92 2 313 Bridget EXP 33 313 1 07-JUN-94 3
CNAME Edmund Bridget Bridget Example 1 • The names of climbers who have climbed route 1. SELECT CName FROM Climbers, Climbs WHERE Climbers.CId = Climbs.CId AND RId= 1;
CNAME Edmund Bridget Bridget Example 2 • The names of climbers who have climbed the route named “Last Tango”. SELECT CName FROM Climbers, Climbs, Routes WHERE Climbers.CId = Climbs.CId AND Routes.RId = Climbs.RID AND RName = 'Last Tango';
CID 313 313 Example 3 • The IDs of climbers who have climbed the same route twice. • Note the use of aliases for relations. SELECT C1.CId FROM Climbs C1, Climbs C2 WHERE C1.CId = C2.CId AND C1.RId = C2.RId AND (C1.Day <> C2.Day OR C1.DURATION <> C2.DURATION));
CNAME James Example 4 • Recall: The names of climbers who have not climbed any route SELECT CName FROM (SELECT CId FROM Climbers MINUS SELECT CId FROM Climbs) Temp, Climbers WHERE Temp.CId = Climbers.CId;
CNAME James Example 4, cont. • A simpler alternative: SELECT CName FROM Climbers WHERE CId NOT IN (SELECT CId FROM Climbs);
Universal Quantification • The IDs of climbers who have climbed all routes. SELECT CId FROM Climbs c1 WHERE NOT EXISTS (SELECT RId Routes not climbed FROM Routes r by c1. WHERE NOT EXISTS (SELECT * FROM Climbs c2 WHERE c1.CId=c2.CId and c2.RId=r.RId)
RNAME DIFFICULTY Last Tango 1200 Garden Path 120 The Sluice 480 Picnic 1200 Non-algebraic operations • SQL has a number of operations that cannot be expressed in relational algebra. The first is to express arithmetic in queries. SELECT RName, Rating * Height AS Difficulty FROM Routes;
Arithmetic, cont • Arithmetic (and other expressions) cannot be used at the top level. E.g. 2+2 isn't an SQL query. • Question -- how would you get SQL to compute 2+2?
COUNT(GRADE) 4 Counting SELECT COUNT(RId) FROM Routes; • Surprisingly, the answer to both of these is the following: SELECT COUNT(Grade) FROM Routes;
COUNT(GRADE) 3 Counting, cont. • To fix this, we use the keyword “DISTINCT”: • Can also use SUM, AVG, MIN and MAX. SELECT COUNT(DISTINCT Grade) FROM Routes;
Group by • So far, these aggregate operators have been applied to all qualifying tuples. Sometimes we want to apply them to each of several groups of tuples. • For example: “Print the number of routes in each grade.”
GRADE COUNT(*) I 2 II 1 III 1 Group by SELECT Grade, COUNT(*) FROM Routes GROUP BY Grade; • Note that only the columns that appear in the GROUP BY statement and “aggregated” columns can appear in the output. So the following would generate an error. SELECT Grade, RName, COUNT(*) FROM Routes GROUP BY Grade;
HEIGHT AVG(RATING) 60 5 100 12 Group by … having • HAVING is to GROUP BY as WHERE is to FROM • “HAVING” is used to restrict the groups that appear in the result. SELECT Height, AVG(Rating) FROM Routes GROUP BY Height HAVING Height < 300;
HEIGHT AVG(RATING) 60 5 400 3 Another example SELECT Height, AVG(Rating) FROM Routes GROUP BY Height HAVING MAX(Rating) < 10;
Null Values • The value of an attribute can be unknown (e.g., a rating has not been assigned) or inapplicable (e.g., no spouse). • SQL provides a special valuenullfor such situations. • The presence of nullcomplicates many issues. E.g.: • Special operators needed to check if value is/is not null. • Is rating>8 true or false when rating is equal to null? What about AND, OR and NOT connectives? 3-valued logic (true, false and unknown). • Meaning of constructs must be defined carefully. (e.g., WHERE clause eliminates rows that don’t evaluate to true.)
Outer Join • A variant of the join that relies on null values: • Tuples of Climbers that do not match some tuple in Climbs would normally be excluded from the result; the “left” outer join preserves them with null values for the missing Climbs attributes. SELECT * FROM Climbers NATURAL LEFT OUTER JOIN Climbs
CId CName Skill Age RId Date Duration 123 Edmund EXP 80 1 10/10/88 5 123 Edmund EXP 80 3 11/08/87 1 214 Arnold BEG 25 2 08/07/92 2 313 Bridget EXP 33 1 12/08/89 5 313 Bridget EXP 33 1 06/07/94 3 212 James MED 27 Result of left outer join • Null values can be disallowed in a query result • by specifying NOT NULL.
Summary • SQL is “relationally complete”: all of the operators of the relational algebra can be simulated. • Additional features: string comparisons, set membership, arithmetic and grouping.