400 likes | 584 Views
More on SQL. Lecture Week 9 INFS 614, Fall 2008. R1. Example Instances. S1. S2. SQL: Seen so far…. Syntax of Basic SQL query: Relation-list ( FROM ) Target-list ( SELECT ) Qualification ( WHERE ) Semantic : Conceptual Evaluation Strategy; String operators: LIKE , NOT LIKE ;
E N D
More on SQL Lecture Week 9 INFS 614, Fall 2008
R1 Example Instances S1 S2
SQL: Seen so far… • Syntax of Basic SQL query: • Relation-list (FROM) • Target-list (SELECT) • Qualification (WHERE) • Semantic: Conceptual Evaluation Strategy; • String operators: LIKE, NOT LIKE; • Set operators: Union (all), Intersect (all), Except (all); • Qualification involving sets: IN, EXISTS, UNIQUE, ANY, ALL (and their negated version); • Nested queries.
Post Processing • Processing on the result of an SQL query: • Sorting: can sort the tuples in the output by any column (even the ones not appearing in the SELECT clause) • Duplicate removal • Example: • Aggregation operators SELECT Distinct S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid and R.bid=103 Order by S.sid asc;
S.age ASC, S.sname DESC Order By: Example SELECT S.sname, S.age FROM Sailors S WHERE S.sname LIKE ‘%U%’ ORDERBY S.age,S.sname • For each attribute in the sort list, we can specify its order: • ASC : Ascending Order • DESC : Descending Order • Default Order is Ascending
Aggregate Operators • Significant extension of relational algebra. COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) single column
COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) Aggregate Operators SELECT COUNT (*) FROM Sailors S SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 single column SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’ SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10 SELECT S.sname FROM Sailors S WHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2)
Find name and age of the oldest sailor(s) SELECT S.sname, MAX (S.age) FROM Sailors S • The first query is illegal! (We’ll look into the reason a bit later, when we discuss GROUP BY.) • The third query is equivalent to the second query, and is allowed in the SQL/92 standard, but is not supported in some systems. SELECT S.sname, S.age FROM Sailors S WHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2) SELECT S.sname, S.age FROM Sailors S WHERE (SELECT MAX (S2.age) FROM Sailors S2) = S.age
SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i For i = 1, 2, ... , 10: GROUP BY and HAVING • So far, we’ve applied aggregate operators to all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples. • Consider: Find the age of the youngest sailor for each rating level. • In general, we don’t know how many rating levels exist, and what the rating values for these levels are! • Suppose we know that rating values go from 1 to 10; we can write 10 queries that look like this (!):
Queries With GROUP BY and HAVING SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification GROUP BYgrouping-list [HAVING group-qualification] • The target-list contains (i) attribute names(ii) terms with aggregate operations (e.g., MIN (S.age)). • The attribute list (i)must be a subset of grouping-list. • Intuitively, each answer tuple corresponds to a group, andthese attributes must have a single value per group. (A group is a set of tuples that have the same value for all attributes in grouping-list.)
Conceptual Evaluation • The cross-product of relation-list is computed, tuples that fail qualification are discarded, `unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of attributes in grouping-list. • The group-qualification is then applied to eliminate some groups. Expressions in group-qualification must have a single value per group! • In effect, an attribute in group-qualificationthat is not an argument of an aggregate op also appears in grouping-list. (SQL does not exploit primary key semantics here!) • One answer tuple is generated per qualifying group.
Query with GROUP BY and HAVING • Query: Find the age of the youngest sailor for each rating level. SELECT S.rating, MIN (S.age) FROM Sailors S GROUP BY S.rating
Find the age of the youngest sailor older than 18, for each rating with at least 2 such sailors older than 18 Answer relation SELECTS.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BYS.rating HAVINGCOUNT (*) > 1 • Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes `unnecessary’. • 2nd column of result is unnamed. (Use AS to name it.)
For each red boat, find the number of reservations for this boat SELECT B.bid, COUNT (*) AS rescount FROM Boats B, Reserves R WHERE R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid • Grouping over a join of two relations. • What do we get if we remove B.color=‘red’ from the WHERE clause and add a HAVING clause with this condition?
Computes total number of sailors who have rating equal to S.rating Find the age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age) SELECTS.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BYS.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERES.rating=S2.rating) • HAVING clause can also contain a correlated subquery. • Compare this with the query where we considered only ratings with 2 sailors over 18! • What if HAVING clause is replaced by: • HAVING COUNT(*) >1
For each sailor with more than three reservations, find the number of his reservations SELECTR.sid, COUNT (*) AS rescount FROM Reserves R GROUP BYR.sid HAVING COUNT(*) > 3 Find the age of the youngest sailor, for each rating level > 5 with an average age > 25 SELECTS.rating, MIN(S.age) AS minAge FROM Sailors S GROUP BYS.rating HAVINGS.rating > 5 AND AVG(S.age) > 25 SELECTS.rating, MIN(S.age) AS minAge FROM Sailors S WHERES.rating > 5 GROUP BYS.rating HAVING AVG(S.age) > 25
Find those ratings for which the average age of sailors is the minimum over all ratings • Aggregate operations cannot be nested! WRONG: SELECT S.rating FROM Sailors S WHERE AVG(S.age) = (SELECT MIN (AVG (S2.age)) FROM Sailors S2 GROUP BY S2.rating)
Continue from previous slide • Correct solution (in SQL/92): SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp)
Note on Oracle SQL in the labs Do not use the keyword “AS” to name a temporary table within the FROM clause.
Note on Oracle SQL in the labs SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp)
Conclusions (so far…) • Post processing on the result of queries is supported. • Aggregation is the most complex “post processing” • “Group by” clause partitions the results into groups • “Having” clause puts condition on groups (just like Where clause on tuples).
Null Values • Field values in a tuple are sometimes unknown(e.g., a rating has not been assigned) or inapplicable(e.g., no spouse’s name). • SQL provides a special value null for such situations.
Dealing with the null value • Special operators are needed to check if value is/is not null. • Attribute_name IS NULL : TRUE if the attribute’s value is null. FALSE otherwise • Attribute_name IS NOT NULL : TRUE if the attribute’s value is not null. FALSE otherwise • Is rating>8 true or false when rating is equal to null? • Actually, it’s unknown. • How about: rating > 8 AND age < 40 ? • Three-valued logic.
Three Valued Logic • Logical operators AND, OR and NOT using a 3-valued logic(TRUE, FALSE and unknown).
Null Values: Some issues The presence of null complicates many issues: • WHERE clause eliminates tuples for which the qualification does not evaluate to TRUE. Important in nested queries involving EXISTS or UNIQUE • Two tuples in SQL are duplicates in a relation if corresponding attributes have the same value or both contains null. SELECT S1.sname FROM SAILORSS1 WHERE EXISTS ( SELECT * FROM SAILORS S2 WHERE S1.sname = S2.sname AND AND S1.SID < S2.SID AND S1.RATING <> S2.RATING)
Issues with the null value: Summary • WHERE and HAVINGclause eliminates rows (groups) for which the qualification does not evaluate to true (i.e., evaluate to false or unknown). • Aggregate functions ignore null values (except count(*)). • Distinct treats all null values as the same. • The arithmetic operations +, -, *, / return null if one of the arguments is null.
= Outer joins (left outer-join)
In Oracle Select * From Sailor S, Reserves R Where S.sid = R.sid (+); How about: Select S.sid, count(R.bid) From Sailor S, Reserves R Where S.sid = R.sid (+) Group by S.sid; OR Select S.sid, count(*) From Sailor S, Reserves R Where S.sid = R.sid (+) Group by S.sid;
More outer joins • Left outer join + sign on the right in Oracle: Select * from R, S where R.id=S.id(+) • Right outer join + sign on the left in Oracle: Select * from R, S where R.id(+)=S.id • Full outer join • not implemented in Oracle
More on value functions • Values can be transformed before aggregated: e.g.: SELECT sum(S.A/2) FROM S; • An interesting decode function (Oracle specific): decode(value, if1, then1, if2, then2, …, else): SELECT sum(decode(Major, ‘INFS’, 1, 0)) as No_IS_Stu, sum(decode(Major, ‘INFS’, 0, 1)) as No_NonIS_Stu FROM Student ; • Calculating GPA from letter grades?
Examples Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)
Query 1 Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade) List the classes (Class-no) currently taken by students whose names start with 'T'. SELECT distinct e.Class-no FROM Enrollment e, Student s WHERE e.Student-Ssn = s.Ssn AND s.S-Name LIKE 'T%'
Query 2 Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade) List the students (SSN) who are currently taking exactly one class. SELECT e.Student-Ssn FROM Enrollment e GROUP BY e.Student-Ssn HAVING 1=count(*)
Query 3 Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade) Give the percentage of the students (among all students) who are currently taking courses offered by ISE (D-code='ISE'). SELECT count(distinct e.Student-Ssn)/count(distinct s.Ssn) as Percent FROM Enrollment e, Class c, Student s WHERE e.Class-no=c.Class-no and c.D-code='ISE';
Query 4 Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade) List the faculty members (F-Name) who teach 2 or more classes. List these faculty members by the number of classes they teach (ascending order). SELECT f.F-Name FROM Faculty f, Class c WHERE f.Ssn=c.Instructor-SSn GROUP BY f.Ssn, f.F-Name HAVING count(distinct c.Class-no)>=2 ORDER BY count(distinct c.Class-no), F-Name;
Query 5 Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade) List the students (SSN and Name) along with the number of classes they are taking. If a student is not taking any class, the student should also be listed (with 0 as the number of classes he/she is taking). The list should be ordered by the number of classes (in an ascending order), and in case of a tie, by the SSN of the students. SELECT s.Ssn, s.S-Name, count(distinct Class-no) FROM Student s, Enrollment e WHERE s.Ssn = e.Student-Ssn (+) GROUP BY s.Ssn, s.S-Name ORDER BY count(distinct Class-no), Ssn
Query 6 Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade) List the faculty members (F-Name) who teach more than twice as many classes as Professor Smith (F-Name='Smith') is teaching. (Note that if Professor Smith is not teaching anything, then any professor who teaches at least one class will satisfy the above query.) SELECT f.F-Name FROM Faculty f, Class c WHERE f.Ssn=c.Instructor-SSn GROUP BY f.Ssn, f.F-Name HAVING count(distinct c.Class-no) > (SELECT 2*count(distinct Class-no) FROM Faculty f, Class c WHERE f.F-Name='Smith' and f.Ssn=c.Instructor-SSn)
Query 7 Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade) Find the number of departments which do not have a chairman (Chair_ssn is 'NULL'). SELECT count(d.D-code) FROM Department d WHERE d.Chair-SSn is NULL
Query 8 Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade) For each department (i.e., student’s Major), give the number of graduate students (status='Grad') and the number of other students (status <> 'Grad'). The two numbers must be shown in the same row as the department code. Hint: use decode. SELECT Major, sum(decode(Status, 'Grad', 1,0)) as Grad, sum(decode(Status, 'Grad', 0,1)) as NoGrad FROM Student GROUP BY Major
Query 9 For each department (D-code), give the highest rank of the professors in the department along with the number of the faculty with that highest rank. The output contains one row for each department. Assume Full>Associate>Assistant, i.e., lexicographic order is fine. Note that some department may not have professors in some ranks (e.g., a department may not have full, or associate or assistant professors). SELECT f.D-Code as dept, f.maxrank, count(e.Ssn) as num FROM (SELECT D-code, max(Rank) as maxrank FROM Faculty GROUP BY D-code) AS f, Faculty e WHERE f.D-code=e.D-code AND e.Rank=f.maxrank GROUP BY f.D-code, f.maxrank ORDER BY f.D-code