340 likes | 347 Views
CS 430 Database Theory. Winter 2005 Lecture 12: SQL DML - SELECT. SELECT. One statement for retrieving data: SELECT <attributes> FROM <table> WHERE <condition>; Approximately: <attributes> ( <condition> (<table>)) {t.<attributes> | <table>(t) AND <condition>}
E N D
CS 430Database Theory Winter 2005 Lecture 12: SQL DML - SELECT
SELECT • One statement for retrieving data: SELECT <attributes> FROM <table> WHERE <condition>; • Approximately: • <attributes>(<condition>(<table>)) • {t.<attributes> | <table>(t) AND <condition>} • Difference: SELECT does not eliminate duplicate rows
SELECTMultiple Tables • Can have multiple tables in the FROM clause • Retrieve names and address of Research department employees • SELECT FNAME, LNAME, ADDRESS FROM EMPLOYEE, DEPARMENT WHERE DNAME = ‘Research’ AND DEPARTMENT.DNUMBER = EMPLOYEE.DNUMBER;
Notes: • SQL is officially case insensitive • However, there are situations where it makes a difference • E.g. MySQL stores MyISAM tables in individual files making table names case sensitive • Attribute names don’t require qualification if they are unique (among all the tables that are part of the query)
SELECTQualifying Attribute Names • We can qualify attribute names • Tablename.attribute • Assume EMPLOYEE and DEPARTMENT both have NAME and DNUMBER attributes: • SELECT FNAME, EMPLOYEE.NAME, ADDRESS FROM EMPLOYEE, DEPARMENT WHERE DEPARTMENT.NAME = ‘Research’ AND DEPARTMENT.DNUMBER = EMPLOYEE.DNUMBER;
SELECTQualifying Attribute Names (2) • <table> [AS] <synonym>: • SELECT FNAME, E.NAME, ADDRESS FROM EMPLOYEE AS E, DEPARMENT AS D WHERE D.NAME = ‘Research’ AND D.NUMBER = E.NUMBER; • Also can refer to same table more than once: • SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME FROM EMPLOYEE E, EMPLOYEE S WHERE E.SUPERSSN = S.SSN;
SELECTWhere Clause is Optional • SELECT FNAME, LNAME, SSN FROM EMPLOYEE; • Including multiple tables: • SELECT FNAME, LNAME, DNAME FROM EMPLOYEE, DEPARTMENT • Result is a Cartesian Product (all combinations of one row from each table)
SELECT * • Can use * in place of attributes: SELECT * FROM EMPLOYEE; • Attributes are listed in order of definition • Can also qualify * with a table name: SELECT EMPLOYEE.*, DNAME FROM EMPLOYEE, DEPARTMENT WHERE DNO = DNUMBER; • Style: Never use * in a program • Always ask for attributes in order expected by program
SELECT DISTINCT • What is we want duplicates eliminated • That is, when we want a set of results • Usually costs some time to eliminate duplicates • In many cases there are no duplicates, e.g. when we select the key of a table • Usually don’t want to eliminate duplicates when using aggregate functions (e.g. SUM) • Do SELECT DISTINCT: • SELECT DISTINCT SALARY FROM EMPLOYEE; • Note: ALL is opposite of DISTINCT
SELECTOrdering the result • Use an ORDER BY clause • E.g. SELECT DNAME, LNAME, FNAME FROM EMPLOYEE, DEPARTMENT WHERE DNO = DNUMBER ORDER BY DNAME, LNAME, FNAME;
SELECTOrdering the result (2) • By default ORDER BY sorts in ascending order, if you want the opposite use ASC and DESC: • SELECT DNAME, LNAME, FNAME FROM EMPLOYEE, DEPARTMENT WHERE DNO = DNUMBER ORDER BY DNAME DESC, LNAME ASC, FNAME ASC;
Operations • There is a plethora of operations available. Some examples: • Arithmetic: +, -, *, / • Comparison: <, <=, =, <> or !=, >=, > • Pattern Matching: LIKE • Operations can appear as part of a condition and as part of attributes to be selected
Operations Examples • SELECT FNAME, LNAME, 1.1 * SALARY FROM EMPLOYEE; • SELECT FNAME, LNAME FROM EMPLOYEE E, EMPLOYEE S WHERE E.MGRSSN = S.SSN AND E.SALARY > 1.5 * S.SALARY; • SELECT FNAME, LNAME, 1.1 * SALARY AS NEW_SALARY FROM EMPLOYEE; • Renames the result column (when it makes a difference)
Pattern Matching • Field LIKE Pattern, Field NOT LIKE Pattern • Patterns • % matches zero or more characters • like Regular Expression * • _ matches one character • like Regular Expression . • Use \ to escape _ and % • Example: • ADDRESS LIKE ‘%Houston%’ • Houston is anywhere in the address string (would include a street named Houston)
Note on Pattern Matching • Consider performance when using pattern matching. • SELECT FNAME, LNAME FROM EMPLOYEE WHERE ADDRESS LIKE ‘%Houston%’; • This would require a full table scan of the EMPLOYEE table
Three Valued Logic • NULLs have multiple meanings • All comparison involving a NULL yield NULL or UNKNOWN • Even NULL = NULL, all NULLs are considered distinct • AND, OR, NOT • FALSE AND UNKNOWN is FALSE • TRUE OR UNKNOWN is TRUE • NOT UNKNOWN is UNKNOWN
More Three Valued Logic • Can explicitly test for NULL with IS NULL, IS NOT NULL • Example: SELECT FNAME, LNAME FROM EMPLOYEE WHERE SUPERSSN IS NULL;
JOINing Tables • SQL incorporate notion of JOINed tables • Can specify a join in the FROM clause • Example: • SELECT FNAME, LNAME, ADDRESS FROM (EMPLOYEE JOIN DEPARMENT ON DNO = DNUMBER) WHERE DNAME = ‘Research’;
More JOINing Tables • NATURAL JOIN • SELECT DNAME, DLOCATION FROM (DEPARTMENT NATURAL JOIN DEPT_LOCATIONS); • OUTER JOINS • SELECT E.LNAME AS ENAME, S.LNAME AS SNAME FROM (EMPLOYEE E LEFT OUTER JOIN EMPLOYEE S ON E.SUPERSSN = S.SSN);
More JOINing Tables • Kinds of JOINs • [INNER] JOIN • Normal Join • CROSS JOIN • Cross Join is a cartesian product (no join condition) • LEFT [OUTER] JOIN • RIGHT [OUTER] JOIN • Left and Right Outer Joins • FULL [OUTER] JOIN • Two sided outer join • Not supported in MySQL
Aggregate Functions • Aggregate Functions: • COUNT, SUM, MAX, MIN, AVG, and some more • SUM, AVG work on numeric values • MAX, MIN work on values with a “total order” • COUNT returns number of tuples or values • Simple example: SELECT AVG(SALARY), MAX(SALARY) FROM EMPLOYEE; • Note: Can include WHERE Clause and Joins SELECT AVG(SALARY), MAX(SALARY) FROM (EMPLOYEE JOIN DEPARTMENT ON DNO=DNUMBER) WHERE DNAME = ‘Research’;
COUNT() • COUNT counts the number of tuples or values • COUNT(*) will count the number of tuples in the query • COUNT(SALARY) will count the number of tuples with a non-null SALARY • COUNT(DISTINCT SALARY) will count the number of DISTINCT values for SALARY
GROUP BY • Problem with simple aggregates: Can’t return more than one row • Relational Algebra: Group rows together and compute aggregate function over the group (“script F” operator) • SQL: GROUP BY CLAUSE • Simple example: SELECT DNO, COUNT(*), AVG(SALARY) FROM EMPLOYEE GROUP BY DNO;
MORE GROUP BY • Restriction (Queries with GROUP BY): All attributes used in the SELECT clause (includes used in expressions) must be either: • Named in the GROUP BY clause • Or, Used in an Aggregate function • Illegal Example (should be): • SELECT DNO, SALARY FROM EMPLOYEE GROUP BY DNO; • Legal Example: • SELECT DNO, MAX(SALARY) FROM EMPLOYEE GROUP BY DNO;
GROUP BY and NULLs • If some of the rows being grouped have null attributes for a GROUPing attribute a separate group is created for those rows with that attribute being NULL • Example: • Get number of employees supervised by each supervisor: SELECT SUPERSSN, COUNT(*) FROM EMPLOYEE GROUP BY SUPERSSN;
HAVING • HAVING provides a condition that is tested after the rows are grouped: • Example: HAVING COUNT(*) > 2 (more than two tuples in the group). • EXAMPLE: SELECT SUPERSSN, MAX(SALARY) FROM EMPLOYEE GROUP BY SUPERSSN HAVING COUNT(*) >= 2;
MORE GROUP BY and HAVING • Semantics • The rest of the query (without GROUP BY and HAVING clauses) is processed first, resulting in a table of tuples • That table is then grouped according to the GROUP BY and Aggregate functions are computed as needed for the SELECT and HAVING clauses • Any HAVING clause is applied • The appropriate attributes from the resulting tuples are returned as the result
Nested Queries • (Need MySQL 4.1 or later) • Mini-example: • SELECT DISTINCT PNUMBER FROM PROJECT WHERE PNUMBER IN (SELECT PNUMBER FROM … WHERE … ); • In general: Anything that can be done with a nested query can be done without
More Nested Queries • IN can work with tuples: • Example: • Find all employees who work the same (project, hours) combination on some project that SSN ‘123456789’ works on. SELECT DISTINCT ESSN FROM WORKS_ON WHERE (PNO, HOURS) IN (SELECT PNO, HOURS FROM WORKS_ON WHERE SSN = ‘123456789’);
More Nested Queries • IN is the same as = ANY • You can use ANY and ALL with any comparison (>, =, etc.) operator • Example: • Find all employees whose salary is greater than all the employees in department 5 • SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > ALL (SELECT SALARY FROM EMPLOYEE WHERE DNO = 5);
Correlated Nested Queries • Refers to queries where a nested query refers to attributes in an outer query • Example: • Find the name of all employees who have a dependent with the same first name and sex • SELECT E.FNAME, E.LNAME FROM EMPLOYEE E WHERE E.SSN IN (SELECT ESSN FROM DEPENDENT E.FNAME = DEPENDENT_NAME AND E.SEX = SEX);
EXISTS and NOT EXISTS • EXISTS and NOT EXISTS are predicates that can be applied to a subquery • Example: • Find all employees having dependents • SELECT FNAME, LNAME FROM EMPLOYEE WHERE NOT EXISTS (SELECT * FROM DEPENDENT WHERE SSN = ESSN);
UNION • You can get the result of more than one query combined into a single result: • SELECT … UNION SELECT … • Results in a table whose rows are the union of the results from the two queries • SELECT DISTINCT is implied with UNIONS • This is a set operation
SELECT Statement Summary • Lots of options • You can mix and match pretty much arbitrarily • Build up queries much as you would relational algebra expressions