560 likes | 677 Views
Database Systems SQL: Advanced Queries. Union, Intersection, and Except (1). SQL supports three set operations: union, intersect, minus Union compatibility is required. Consider the following two relations: Students(SSN, Name, GPA, GRE, DeptName) Instructors(SSN, Name, Office, DeptName).
E N D
Union, Intersection, and Except (1) • SQL supports three set operations: • union, intersect, minus • Union compatibility is required. • Consider the following two relations: • Students(SSN, Name, GPA, GRE, DeptName) • Instructors(SSN, Name, Office, DeptName)
Union, Intersection, and Except (2) Find the names of those people who are either a graduate student or an instructor or both in CS department. select Name from Student where DeptName='CS' union select Name from Instructor where DeptName='CS' union removes duplicate rows. union all keeps duplicate rows.
Union, Intersection, and Except (3) • Find the names of those who are both a graduate student and an instructor in CS department. select Name from Student where DeptName='CS‘ intersect select Name from Instructor where DeptName='CS‘ • Find the names of those who are an instructor but not a graduate student in CS department. select Name from Instructor where DeptName='CS' except select Name from Student where DeptName='CS'
Union, Intersection, and Except (5) • make a list of all project numbers for projects that involve an employee whose last name is “Smith”, either as a worker or as a manager of the department that controls the project. (SELECT distinct Pnumber FROM PROJECT, Works-on, Employee WHERE Pnumber = Pno AND ESSN = SSN AND Lname = “Smith”) UNION (SELECT PNumber FROM Project, Department, Employee WHERE Dnum=Dnumber AND MGRSSN=SSN AND Lname=“Smith”)
Comparisons Involving NULL • SQL allows queries that check if a value is NULL (missing or unknown or not applicable) • SQL uses IS or IS NOT to compare NULLs because it considers each NULL value distinct from other NULL values, so equality comparison is not appropriate . • Find all employees who have no managers. select * from Employees where Super_SSN is null; • Find all employees who have a manager. select * from Employees where Super_SSN is not null;
Null Values and Three Valued Logic • The result of any arithmetic expression involving null is null • E.g. 5 + null returns null • Any comparison with null returns unknown (null) • E.g. 5 < null or null <> null or null = null • Three-valued logic using the truth value unknown: • OR (unknownortrue) = true (unknownorfalse) = unknown (unknown or unknown) = unknown • AND (true and unknown) = unknown (false and unknown) = false (unknown and unknown) = unknown • NOT (not unknown) = unknown • Result of where clause predicate is treated as false if it evaluates to unknown (null)
Null Values and Three Valued Logic • Eaxmple: Consider table Employees with the following data: EmpId Name Salary Bonus 123 Smith 25000 5000 234 John 28000 null select Name, Salary+Bonus Total from Employee • Result is: Name Total Smith 30000 John null
NVL function • Use NVL function to convert null values • Format: nvl(exp1, exp2) • exp1 is a source value that may be null • exp2 is the target value for converting null • Ex: select name, salary+nvl(bonus,0) Total from Employee • Result is: Name Total Smith 30000 John 28000
Using IN • Retrieve the social security numbers of all employees who work on project number 1, 2, or 3; select distinct essn from works-on where pno in (1, 2, 3); • Find all students who are 20, 22, or 24 years old. select * from Students where age in (20, 22, 24); • not in is the opposite of in.
Nested Queries • A complete SELECT query, called a nested query , can be specified within the WHERE-clause of another query, called the outer query • Many of the previous queries can be specified in an alternative form using nesting • Retrieve the name and address of all employees who work for the 'Research' department.SELECT FNAME, LNAME, ADDRESS FROM EMPLOYEE WHERE DNOIN(SELECT DNUMBER FROM DEPARTMENT WHERE DNAME='Research');
Nested Queries • Make a list of all project numbers for projects that involve an employee whose last name is “Smith”, either as a worker or as a manager of the department that controls the project. select distinct Pnumber from Project where PnumberIN (select pnumber from project, department, employee where dnum=dnumber and MGRSSN=SSN and Lname=‘Smith’) OR PnumberIN (select pno from works-on, employee where ESSN=SSN and Lname = ‘Smith’);
Nested Queries • Find the names of all students who take at least one course offered by the CS department. select name from student s, enrollment e where s.ssn=e.ssn and e.cnoin (select cno from course where dname= 'CS') • The previous query is equivalent to: (1) select name from student s, enrollment e, course c where s.ssn = e.ssn and e.cno=c.cno and c.dname = 'CS’; (2) select name from student where ssnin (select ssn from enrollment where cnoin (select cno from course where dname='CS'));
Nested Queries • The previous query is a nested query. The query outside is an outer query and the query nested under the outer query is an inner query. Multiple nestings are allowed. • The semantics of the above query is to evaluate the inner query first and evaluate the outer query last. • Correlated nested query • Evaluated once for each tuple in the outer query
Nested Queries • Example: Consider the relations: Employee (SSN, Name, Age), Dependent (Name, Sex, ESSN) • Find the names of all employees who has a dependent with the same name. select name from employee where namein (select name from dependent where essn = ssn)
Nested Queries • in accepts multi-column list. • Find all enrollments that have 25 years old students taking CS courses. select * from enrollment where (ssn, cno) in (select s.ssn, c.cno from student s, course c where s.age=25 and c.dname='CS'); • Another way to write the query: select * from enrollment where ssnin(select ssn from student where age = 25) and cnoin (select cno from course where dname='CS');
Using ANY • There’s an alternate way of seeing if a value is IN some table • The evaluation of ANY expressions is not always intuitive • Examples: • 4 = ANY (1,2,3,4) • True, because it’s equal to the 4. • 4=1 or 4=2 or 4=3 or 4=4 • 4 IN (1,2,3,4) • 4 <> ANY (1,2,3,4) • True, because it’s not equal to the 1. • 4<>1 or 4<>2 or 4<>3 or 4<>4
Using ANY • Find the names of those students who are 18 or younger and whose GPA is higher than the GPA of any students who are 25 or older. select name from student where age <= 18 and gpa> any (select gpa from student where age >= 25); • Other set comparison operators: >any, <any, <=any, >=any, =any, <>any
Using ALL • Similarly, we can use ALL to see how a value compares to all values in a table • Other set comparison operators: >all, <all, <=all, >=all, =all, <>all • Again, the evaluation of ALL expressions is not always intuitive • Examples: • 4=ALL(1,2,3,4) • False, because it’s not equal to one of the values, which is 1. • 4=1 and 4=2 and 4=3 and 4=4 • 4<>ALL(1,2,3,4) • False, because it’s equal to one of the values, which is 4. • 4<>1 and 4<>2 and 4<>3 and 4=4 • <>all is equivalent to not in. • 4=ALL(4,4,4,4) • True, because all of the values are 4. • 4=4 and 4=4 and 4=4 and 4=4
Using ALL • Retrieve the names of employees whose salary is greater than the salary of all the employees in Department 5 SELECT lname, fname from employee where salary> ALL (select salary from employee where DNO=5);
Using ANY and ALL • =any is equivalent to in • <>any is not equivalent to not in. • <>all is equivalent to not in. Let x = a and S = {a, b}. Then x<> any S is true (x<> b) but x not in S is false. x <> all S is also false (x <> b but x=a).
Correlated Nested Queries • If a condition in the WHERE-clause of a nested query references an attribute of a relation declared in the outer query , the two queries are said to be correlated. • The result of a correlated nested query is different for each tuple (or combination of tuples) of the relation(s) the outer query • Retrieve the name of each employee who has a dependent with the same first name and same sex as the employee.SELECT FNAME, LNAME FROM EMPLOYEE E WHERE SSNIN (SELECT ESSN FROM DEPENDENT WHERE ESSN = SSN AND FNAME = DNAME AND E.Sex = Sex);
Correlated Nested Queries • In the previous query, the nested query has a different result for each tuple in the outer query • A query written with nested SELECT... FROM... WHERE... blocks and using the = or IN comparison operators can alwaysbe expressed as a single block query. SELECT FNAME, LNAMEFROM EMPLOYEE E, DEPENDENT DWHERE SSN=ESSN AND FNAME=DNAME AND E.Sex = D.Sex;
The EXISTS Function • EXISTS is used to check whether the result of a correlated nested query is empty (contains no tuples) or not. • We can formulate the previous query in an alternative form that uses EXISTS as below: SELECT FNAME, LNAME FROM EMPLOYEE E WHERE EXISTS (SELECT * FROM DEPENDENT WHERE SSN=ESSN AND FNAME=DNAME AND E.Sex = Sex);
The EXISTS Function • Retrieve the names of employees who have no dependents.SELECT FNAME, LNAME FROM EMPLOYEE WHERE NOT EXISTS (SELECT * FROM DEPENDENT WHERE SSN=ESSN); • The correlated nested query retrieves all DEPENDENT tuples related to an EMPLOYEE tuple. If none exist , the EMPLOYEE tuple is selected • EXISTS is necessary for the expressive power of SQL
The EXISTS Function • Find all students who take at least one course. select * from student s where exists (select * from enrollment where ssn = s.ssn); • The previous query is equivalent to: select distinct s.* from student s, enrollment e where s.ssn = e.ssn; select * from student where ssnin (select ssn from enrollment);
The EXISTS Function • exists () is true if the set () is not empty. • exists () is false if the set () is empty. • not exists () is true if the set () is empty. • not exists () is false if the set () is not empty.
The EXISTS Function • Find all students who do not take CS228. select * from student s where not exists (select * from enrollment where ssn=s.ssn and cno= 'CS228'); • This query is equivalent to: select * from student where ssnnot in (select ssn from enrollment where cno = 'CS228');
Joining Tables • Find the titles of all courses offered by departments located in building EB. Department(Name, Location) Dept (DName, Location) Course(CNo, Title, DName) select Title from Course joinDepartment onDname=Name where Location = 'EB’; or select Title from Course natural join Department As D(DName, Location) where Location = 'EB' or select Title from Course natural join Dept where Location = 'EB'
Joining Tables SELECT FNAME, LNAME, ADDRESSFROM EMPLOYEE, DEPARTMENTWHERE DNAME='Research' AND DNUMBER=DNO; could be written as:SELECT FNAME, LNAME, ADDRESSFROM (EMPLOYEE JOIN DEPARTMENTON DNUMBER=DNO)WHERE DNAME='Research’; or as:SELECT FNAME, LNAME, ADDRESSFROM (EMPLOYEE NATURAL JOIN DEPARTMENT AS DEPT(DNAME, DNO, MSSN, MSDATE)WHERE DNAME='Research’
Joining Tables • Retrieve pnumber, dnum, lname, bdate for employees working on projects located at “Stafford”. SELECT PNUMBER, DNUM, LNAME, BDATEFROM ((PROJECT JOIN DEPARTMENT ONDNUM=DNUMBER) JOIN EMPLOYEE ONMGRSSN=SSN )WHERE PLOCATION='Stafford’;
Joining Tables • Employee(SSN, Name, Position, Pno) Project(Pno, Title, Budget) • Find who works for which project. select SSN, Name, Title from Employee e join Project p one.Pno = p.Pno; • Identify also those who do not work for any project. select SSN, Name, Title from Employee e left outer join Project p on e.Pno = p.Pno;
Joining Tables • Identify also those projects no one works for. select ssn, name, title from employee e right outer join project p on e.Pno = p.Pno; • Identify even those who do not work for any project and those projects no one works for: select ssn, name, title from employee natural full join project;
Aggregate Functions • SQL supports five aggregate functions: Name Argum. type Result type Description avg numeric numeric average count any numeric count min char or num. same as arg minimum max char or num. same as arg maximum sum numeric numeric sum
Aggregate Functions • Find the maximum salary, the minimum salary, and the average salary among all employees.SELECT MAX(salary), MIN(salary), AVG(salary)FROM EMPLOYEE; • Find the maximum salary, the minimum salary, and the average salary among employees who work for the 'Research' department. SELECT MAX(salary), MIN(salary), AVG(salary) FROM EMPLOYEE, DEPARTMENTWHERE DNO=DNUMBER AND DNAME='Research‘;
Aggregate Functions • Retrieve the total number of employees in the company SELECT COUNT (*) FROM EMPLOYEE; • Retrieve the number of employees in the 'Research' departmentSELECT COUNT (*) FROM EMPLOYEE, DEPARTMENTWHERE DNO=DNUMBER AND DNAME='Research’; • Find the number of courses offered. select count(distinct Course_no) from Enrollment • Note that the above is different from select distinct count(Course_no) from Enrollment • Find the average, the minimum and the maximum GPAs among all students' GPAs. select avg(GPA), min(GPA), max(GPA) from Student
Aggregate Functions • Find the SSNs and names of all students who take 5 or more courses. select ssn, name from student s where 5 <= (select count(*) from enrollment where ssn = s.ssn); • Count the number of distinct salary values in the database. select count(distinct salary) from employee; • Retrieve the names of all employees who have two or more dependents. select lname, fname from employee where (select count(*) from dependent where ssn= essn) >= 2;
Aggregate Functions • Find the number of students who are 17 years old. select count(*) from student where age = 17 • The above last query is equivalent to select count(ssn) from student where age = 17 • but may not be equivalent to select count(name) from student where age=17 • Aggregate functions, except count(*), ignore null values.
Group By & Having Clause • In many cases, we want to apply the aggregate functions to subgroups of tuples in a relation • Each subgroup of tuples consists of the set of tuples that have the same value for the grouping attribute(s) • The function is applied to each subgroup independently • SQL has a GROUP BY-clause for specifying the grouping attributes, which must also appear in the SELECT-clause
Grouping • For each department, retrieve the department number, the number of employees in the department, and their average salary. SELECT DNO, COUNT (*), AVG (SALARY)FROM EMPLOYEEGROUP BY DNO; • the EMPLOYEE tuples are divided into groups--each group having the same value for the grouping attribute DNO • The COUNT and AVG functions are applied to each such group of tuples separately • The SELECT-clause includes only the grouping attribute and the functions to be applied on each group of tuples • A join condition can be used in conjunction with grouping
Grouping • For each project, retrieve the project number, project name, and the number of employees who work on that project.SELECT PNUMBER, PNAME, COUNT (*)FROM PROJECT, WORKS_ONWHERE PNUMBER=PNOGROUP BYPNUMBER, PNAME; • In this case, the grouping and functions are applied after the joining of the two relations
Grouping • Find the average GPA for students of different age groups. selectage, avg(gpa) from student group byAge; • One tuple will be generated for each distinct value of age. STUDENTS RESULT SSN Name Age GPA Age avg(GPA) 123456789 John 19 3.6 19 3.7 234567891 Tom 20 3.2 20 3.5 345678912 Tom 19 3.8 21 2.8 345678912 Bill 21 2.8 345678912 Mary 20 3.8
Grouping • When group by is used, each attribute in the select clause must have a single atomic value for each group of common group by values. • Each attribute in the select clause should be either a grouping attribute or an attribute on which a set function is applied. • Every grouping attribute should be listed in the select clause. • The following is an illegal query: select age, ssn, avg(gpa) from student group by age
Grouping • Find the SSN, name and the number of credit hours each student still needs to graduate. • Course(CNo, Title, Dept_Name, CreditHour) • Enrollment(SSN, CNo, Grade, Semester) • Assume that each student needs 120 credit hours to graduate. selectssn, name, 120-sum(credithour) CreditNeeded from student s, enrollment e, course c where s.ssn=e.ssn and e.cno=c.cno group bys.ssn, name
The Having Clause • Sometimes we want to retrieve the values of these functions for only those groups that satisfy certain conditions • Having is used in conjunction with group by in the cases where the groups must satisfy a condition to appear in the result. • The HAVING-clause is used for specifying a selection condition on groups (rather than on individual tuples) • Conditions on aggregate functions are specified in the having clause.
The Having Clause • For each project on which more than two employees work , retrieve the project number, project name, and the number of employees who work on that project.SELECT PNUMBER, PNAME, COUNT (*)FROM PROJECT, WORKS_ONWHERE PNUMBER=PNOGROUP BY PNUMBER, PNAMEHAVINGCOUNT (*) > 2; • Find the number of students of each age group and output only those groups having more than 50 members. select Age, count(*) from Students group by Age having count(*) > 50;
The Having Clause • For each department having more than five employees, retrieve the department name and the number of employees making more than $40000. SELECT Dname, COUNT(*) FROM Department, Employee WHERE Dnumber = DNO and Salary > 40000 AND DNO IN (Select DNO FROM Employee Group by DNO Having COUNT(*)>5) GROUP BY Dname;
Summary of SQL Queries • A query in SQL can consist of up to six clauses, but only the first two, SELECT and FROM, are mandatory. • The clauses are specified in the following order:SELECT <attribute list>FROM <table list>[WHERE <condition>][GROUP BY <grouping attribute(s)>][HAVING <group condition>][ORDER BY <attribute list>]
Evaluation of a six-clause query 1. Form all combinations of tuples from the relations in the from clause (Cartesian product). 2. Among all the tuples generated in step 1, find those that satisfy the conditions in the where clause. 3. Group the remaining tuples based on the grouping attributes. 4. Among all the tuples generated in step 3, find those that satisfy the conditions in the having clause. 5. Using the order by clause to order the tuples produced in step 4. 6. Project on the desired attribute values as specified in the select clause.
SQL: Example 1 • Find the average GPAs of students of different age groups. Only students younger than 35 are considered and only groups with the average GPAs higher than 3.2 are listed. The listing should be in ascending age values. selectage, avg(gpa) fromstudent whereage < 35 group by age having avg(gpa) > 3.2 order by age