480 likes | 754 Views
SQL (DML) II. Review: Formal Semantics of SQL: RA*. Semantics: p * A1, A2, ..., An ( s * P (r1 x* r2 x* ... rm)). Query:. SELECT A1, A2, …, An FROM r1, r2, …, rm WHERE P. Semantics: (Remove * on Pi) p A1, A2, ..., An ( s * P (r1 x* r2 x* ... rm)).
E N D
Review: Formal Semantics of SQL: RA* Semantics: p* A1, A2, ..., An( s*P (r1 x* r2 x* ... rm)) Query: SELECT A1, A2, …, An FROM r1, r2, …, rm WHERE P Semantics: (Remove * on Pi) pA1, A2, ..., An( s*P (r1 x* r2 x* ... rm)) Query: SELECT DISTINCT A1, A2, ..., An FROM r1, r2, ..., rm WHERE P Semantics: Assign result to a new relation S S pA1, A2, ..., An( s*P (r1 x* r2 x* ... rm)) Query: CREATE TABLE S AS SELECT DISTINCT A1, A2, ..., An FROM r1, r2, ..., rm WHERE P
Review of SQL Semantics (RA/RA*) ---------- p* (or p) X* s* Can’t express s* Can’t express Clause ---------- SELECT[DISTINCT] FROM WHERE CREATE TABLE AS SELECT GROUP BY HAVING ORDER BY
Aggregates and Group By • Usually, aggregates used with “Group By” • E.g. SELECT bname, COUNT (DISTINCT cname) FROM depositor d, account a WHERE d.acct_no = a.acct_no GROUP BY bname depositor (customer-name, account-number) account (account-number, branch-name, balance) Result: bname COUNT -------------------------- Downtown 2 Mianus 5 Perry 1 Brighton 5 Kenmore 7
Aggregates and Group By Intuition behind “Group By” SELECT bname, COUNT (DISTINCT cname) FROM depositor d, account a WHERE d.acct_no = a.acct_no GROUP BY bname Step 1: “Group “ results of join Step 2: aggregate on groups and project on result bname COUNT -------------------------- Downtown 1 Mianus 1 Perry 1 Brighton 2 Kenmore 2
Group By • Another example: SELECT bname, SUM(assets) as total FROM branch GROUP BY bcity branch(bname, bcity, assets) Above query is NOT allowed Non-aggregated values in SELECT clause (e.g., bname) must also appear in GROUP BY clause Result ?? bname total -------------------------- Redwood 2.1M Pownal 0.3M N. Town 3.7M ???? 2M ????? 10.1M SELECT A1, A2, ..., Ak, Agg1(), ...., Aggi() FROM .......... WHERE ............ GROUP BY A1, A2, ..., Ak, Ak+1, ..., An
HAVING WHERE :: FROM as HAVING :: GROUP BY • HAVING P: selects rows from result of GROUP BY • Optional (missing HAVING means TRUE) Example: Find names of branches and the average account balance for those branches having average account balance > $1200 SELECT bname, AVG(balance) AS avgFROM accountGROUP BY bnameHAVING avg > 1200 Result same as: CREATE TABLE Temp AS SELECT bname, AVG(balance) AS avg FROM accountGROUP BY bname SELECT *FROM Temp WHERE avg > 1200
Review - Summary Thus Far account (acct_no, bname, balance) branch(bname, bcity, assets) Kitchen sink query: SELECT bcity, sum(balance) AS totalbalance FROM branch b, account a WHERE b.bname=a.bname AND assets >= 1M GROUP BY bcity HAVING totalbalance > 700 ORDER BY bcity DESC Steps 1,2 : FROM, WHERE
Summary thus far Steps 3,4 : GROUP BY, SELECT Steps 5 : HAVING Steps 6 : ORDER BY
Summary thus far Semantics (RA/RA*) ---------- p* (or p) X* s* Can’t express s* Can’t express Clause ---------- SELECT[DISTINCT] FROM WHERE GROUP BY HAVING ORDER BY Evaluation Order ---------- 4 1 2 3 5 6
branch (bname, bcity, assets) • customer (cname, cstreet, customer-only) • account (acct_no, bname, balance) • loan (lno, bname, amt) • depositor (cname, acct_no) • borrower (cname, lno)
More SQL • NULL values and SQL • Nested Queries • Views • Updates • Joins • PseudoColumns • Connect-By Hierarchical Queries
NULLs The “dirty litle secret” of SQL (major headache for query optimization) can be a value for any attribute e.g. : • What does this mean? • We don’t know Kenmore’s assets? • Kenmore has no assets? • .................... branch2= Effect on Queries: SELECT * FROM branch2 WHERE assets = NULL SELECT * FROM branch2 WHERE assets IS NULL
NULLs • Arithmetic with nulls: • n op null = null op : + , - , *, /, mod, ... • Booleans with nulls: One can write: 3-valued logic (true, false, unknown) SELECT ........... FROM ............. WHERE boolexpr IS UNKNOWN • What expressions evaluate to UNKNOWN? • Comparisons with NULL (e.g. assets = NULL) • FALSE OR UNKNOWN (but: TRUE OR UNKNOWN = TRUE) • TRUE AND UNKNOWN • UNKNOWN AND/OR UNKNOWN
NULLs Given: branch2= Aggregate operations: SELECT SUM(assets) FROM branch2 SUM -------- 11.1M returns NULL is ignored Same for AVG, MIN, MAX But.... COUNT(assets) returns 4! Let branch3 an empty relation Then: SELECT SUM(assets) FROM branch3 returns NULL but COUNT(<empty rel>) = 0
Nested Queries in SQL • Queries containing other queries • Inner query: • Can appear in FROM or WHERE clause “inner query” “outer query” Example: SELECT cname FROM borrower WHERE cname IN (SELECT cname FROM depositor) Think of this as a function that returns the result of the inner query
Nested Queries in SQL Another example: SELECT DISTINCT cname FROM borrower as b, loan as l WHERE b.lno = l.lno AND bname = “Perry” AND (bname, cname) IN ( SELECT bname, cname FROM depositor as d, account as a WHERE d.acct_no = a.acct_no) Q: Describe what this query returns Ans: Names of borrowers at the Perry branch who also have savings account at the same branch
Nested Queries in SQL • What can we do with queries in the WHERE clause? • Let A = SELECT ... FROM ... WHERE .... • WHERE <expr> IN A WHERE <expr> NOT IN A • WHERE <expr> < SOME (A) ( ) WHERE <expr> < ALL (A) ( ) Also (‘>’, ‘<=‘, ‘>=‘, ‘=‘, ...) • WHERE EXISTS(A) ( )
0 5 6 Nested Queries in SQL SOME ( ): e < SOME(A) (5< some ) = true (read: 5 < some tuple in the relation) 0 ) = false (5< some 5 0 ) = true (5 = some 5 0 (5 some ) = true (since 0 5) 5 (= some) in However, ( some) not in example: (3 some ) 3 5
Nested Queries in SQL Example query with SOME: SELECT bname FROM branch WHERE assets > SOME ( SELECT assets FROM branch WHERE bcity=‘Bkln’) Q: Describe what this query returns Ans: Find all branches that have greater assets than some branch in Brooklyn.
Nested Queries in SQL • Optimization of queries! SELECT bname FROM branch WHERE assets > SOME ( SELECT assets FROM branch WHERE bcity=‘Bkln’) Q: Can you rewrite this query to something that is equivalent, but more efficient to execute? 1) CREATE TABLE TEMP 2) SELECT bname AS SELECT assets FROM branch FROM branch WHERE assets > SOME (Temp) WHERE bname=‘Bkln’ Q: Why this is better ?
0 5 6 Nested Queries in SQL ALL ( ): e < ALL (A) (5< all ) = false 6 ) = true (5< all 10 4 ) = false (5 = all 5 4 (5 all ) = true (since 5 4 and 5 6) 6 (all) not in However, (= all) in
Nested Queries in SQL Example query with ALL: SELECT bname FROM branch WHERE assets > ALL ( SELECT assets FROM branch WHERE bcity=‘Bkln’) Returns: all branches that have greater assets than all branches in Brooklyn
Nested Queries in SQL EXISTS: EXISTS(A) (i.e., true if A is not empty) Example: Find all customers who have accounts at all branches in Brooklyn SELECT DISTINCT cname FROM depositor as S WHERE NOT EXISTS ( ( SELECT bname FROM branch WHERE bcity = ‘Bkln’) EXCEPT ( SELECT bname FROM depositor as T, account as R WHERE T.acct_no = R.acct_no AND S.cname = T.cname)) Inner Query: (branches in Brooklyn) - (branches where S has an account)
Nested Queries in SQL Correlated Queries: when the inner query contains tuple variables of the outer query Example: Find all accounts located at branches that also hold a loan for a smaller amount SELECT DISTINCT A.cct_no FROM account as A WHERE EXISTS ( SELECT * FROM loan as L WHERE A.bname = L.bname AND A.balance > L.amt) “correlation”: inner query contains reference to table in outer query
Nested Queries in SQL What about query optimization? SELECT DISTINCT A.cct_no FROM account as A WHERE EXISTS ( SELECT * FROM loan as L WHERE A.bname = L.bname AND A.balance > L.amt) SELECT DISTINCT A.acct_no FROM account as A, loan as L WHERE A.bname = L.bname AND A.balance > L.amt Above an example of Query Unnesting. advantageous because joins evaluate in more ways than nested queries
Nested Queries in SQL SELECT bname FROM account as A WHERE bal > (SELECT AVG(amt) FROM loan as L WHERE A.bname = L.bname) Correlation with Aggregates: Returns the branch names where accounts are held whose balances are more than the average loan taken at the same branch Kim’s technique: (1) CREATE TABLE Temp AS SELECT bname, AVG(amt) as avgloan FROM Loan GROUP BY bname (2) SELECT A. bname FROM account as A, temp as L WHERE A.bname = L.bname AND A.bal > L.avgloan
Kim’s Unnesting Technique • Why is the rewrite better than the original query? Ans: the rewrite computes the avg loans JUST ONCE per branch Is the rewrite always better than the nested query? Ans: NO: if Loan has branch names not in account
Nested Queries in SQL Nesting also possible in FROM clause Example: Another way to express HAVING SELECT bname, avg(balance) as avgbal FROM account GROUP BY bname HAVING AVG(balance) > 1200 SELECT * FROM (SELECT bname, AVG(balance) as avgbal FROM account GOUP BY bname) TempRes WHERE avgbal > 1200
Nested Queries in SQL Nested query in FROM clause, another example: Find the maximum total balance across all branches SELECT MAX(tot-balance) FROM (SELECT bname, SUM(balance) AS tot_balance FROM account GROUP BY bname) branch-total
SQL: Joins Suppose: loan borrower • Select bname, loan.lno, amt, cname, borrower.lno • From loan INNER JOIN borrower ON loan.lno = borrower.lno (2) Alternately, in Oracle SQL you can specify as: Select bname, loan.lno, amt, cname, borrower.lno from loan l, borrower b where l.lno = b.lno;
SQL: Joins Suppose: loan borrower • Select bname, loan.lno, amt, cname, borrower.lno • From loan LEFT OUTER JOIN borrower ON loan.lno = borrower.lno (2) Alternately, in Oracle SQL you can specify as: Select bname, loan.lno, amt, cname, borrower.lno from loan l, borrower b where l.lno = b.lno (+);
SQL: Joins SQL Expression R INNER JOIN S ON P R LEFT OUTER JOIN S ON P R RIGHT OUTER JOIN S ON P R FULL OUTER JOIN S ON P Also: R NATURAL INNER JOIN S R NATURAL LEFT OUTER JOIN S RA expressions R P* S R P* S R P* S R P* S R * S R * S Etc..
SQL: Joins • In general: NATURAL <Join Type> or <Join Type> ON <Predicate> or <Join Type> USING (<attr1> , <attr2> ,...., <attrn>) Where: <join type> = ( INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN)
SQL: Joins Using: loan borrower loan FULL OUTER JOIN borrower USING lno USING similar to NATURAL join but.. Columns listed must be those in common (used for equi-join), not necessarily all of them
SQL: Joins Find all customers who have an account or a loan at the bank (but not both) SELECT cname FROM depositor NATURAL FULL OUTER JOIN borrower WHERE (acct_no IS NULL) OR (lno IS NULL)
Views Can involve complex queries: example: create a view consisting of branches and customers. CREATE VIEW all-customers AS (SELECT bname, cname FROM depositor as d, account as a WHERE d.acct_no = a.acct_no) UNION (SELECT bname, cname FROM borrower as b, loan as l WHERE b.lno=l.lno) SELECT cname FROM all-customers WHERE bname = ‘Perry’ Can treat that view like a table: Find all customers of Perry:
Views Mechanism for hiding data from view of certain users CREATE VIEW V AS <Query Expression> Example: show all branches and cities, but hide assets CREATE VIEW branch2 AS SELECT bname, bcity FROM branch
Views create table vs create view (2) CREATE VIEW branch2 AS SELECT bname, bcity FROM branch (1) CREATE TABLE branch2 AS SELECT bname, bcity FROM branch vs (1) creates new table that gets stored on disk (2) creates “virtual table” (materialized when needed) Therefore: changes in branch are seen in the view version of branch2 (2) but not for the (1) case. Any indexes on branch are automatically used.
Pseudocolumns • Rownum (numeric) • A number for each row in a select query • Example: • Branch(bname, bcity) • SELECT rownum ordered_branch_seq, bname, bcity FROM branch ORDER BY bname; • Rowid (char string) • A character string that represents the physical address for a row of a table. • SELECT rowid, bname, bcity FROM branch. • Access by rowid fastest. • However, rowids may change during some table maintenance operations • Split of table partitions. • If you delete and insert the same branch name, the rowids for that row are likely to be different
CONNECT-BY (not in book) • Person (father, name) • Find the children of ‘Mike Johnson’ • SELECT a.name FROM Person a where a.father = ‘Mike Johnson’; • What-if you want to find great-great-grandchildren? • Easier way in Oracle: CONNECT-BY • Useful for answering hierarchical queries
CONNECT-BY • Find all descendants from Mike Johnson • SELECT a.name FROM Person a START WITH a.name = ‘Mike Johnson’ CONNECT BY PRIOR name = father; • First Level (of recursion) • Mike Johnson • Second Level • All names for which Mike is a father (i.e., children of Mike) • Third Level • All names for which Mike’s children are fathers • ….
CONNECT-BY • Find all descendants from Mike Johnson • SELECT a.name FROM Person a START WITH a.name = ‘Mike Johnson’ CONNECT BY PRIOR name = father; • Pseudocolumn: LEVEL SELECT a.name, LEVEL FROM Person a START WITH a.name = ‘Mike Johnson’ CONNECT BY PRIOR name = father; • Ordering among siblings: SELECT a.name, LEVEL FROM Person a START WITH a.name = ‘Mike Johnson’ CONNECT BY PRIOR name = father ORDER SIBLINGS BY name;
CONNECT-BY • Find upto grand-children only for Mike Johnson • Restrict the LEVEL to be <=3 (Mike, his children and grand-children) • What if you have “loops” in the data? • Oracle raises an error • To ignore the loops, specify NOCYCLE SELECT a.name, LEVEL FROM Person a START WITH a.name = ‘Mike Johnson’ CONNECT BY NOCYCLE PRIOR name = father ORDER SIBLINGS BY name; Lot more Information: Oracle Help links, or http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/gennick_connectby.html http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10759/queries003.htm
SQL Summary Query Language (DML): SELECT, FROM, WHERE GROUP BY, HAVING ORDER BY Nested Queries (IN, EXISTS, UNIQUE, ALL, SOME, correlation) Aggregates Joins Updates Views PseudoColumns Connect-by hierarchical queries More to come: Inserts, deletes ,…. DDL: CREATE TABLE DROP TABLE ALTER TABLE • Embedded SQL/JDBC • Integrity Constraints • Security, authorization