1.08k likes | 1.09k Views
Learn the standard DML/DDL for relational databases, SQL query basics, formal semantics, SQL operators, aggregation, and grouping. Discover how to use SQL effectively for data manipulation and definition.
E N D
CMSC424: Database Design Lecture 6 CMSC424, Spring 2005
SQL - Introduction • Standard DML/DDL for relational DB’s • DML = Data Manipulation Language (queries, updates) • DDL = Data Definition Language (create tables, indexes, …) • Also includes • View definition • Security (Authorization) • Integrity constraints • Transactions CMSC424, Spring 2005
SQL: Basic Structure • SELECT A1, ….., An • FROM r1, ….., rm • WHERE P Equivalent to: • A1,A2,…,An (σP (r1 … rn)) CMSC424, Spring 2005
A Simple SELECT-FROM-WHERE Query • Similar to SELECT bname FROM loan WHERE amt > 1000 bname( amt > 1000(loan) ) • But not quite In general, SQL will not remove duplicates unless asked to Duplicates are retained (i.e., result not a set) CMSC424, Spring 2005
A Simple SELECT-FROM-WHERE Query • Similar to SELECT DISTINCT bname FROM loan WHERE amt > 1000 bname( amt > 1000(loan) ) • Result: CMSC424, Spring 2005
Next • Formal Semantics of SQL • Bag or multiset semantics CMSC424, Spring 2005
Formal Semantics of SQL • RA can only express SELECT DISTINCT queries • To express SQL, must extend RA to a bag algebra • Bags (aka: multisets) like sets, but can have duplicates e.g: {5, 3, 3} e.g: homes = • Next: will define RA*: a bag version of RA CMSC424, Spring 2005
Formal Semantics of SQL: RA* • *p (r): preserves copies in r e.g: *city = Brighton (homes) = • *A1, …, An (r): no duplicate elimination e.g: *cname (homes) = CMSC424, Spring 2005
Formal Semantics of SQL: RA* • r* s : additive union • * = r s • r-* s: bag difference e.g: r -* s = s -* r = CMSC424, Spring 2005
Formal Semantics of SQL: RA* • r* s: cartesian product = * CMSC424, Spring 2005
Formal Semantics of SQL • Query: SELECT a1, ….., an FROM r1, ….., rm WHERE p • Semantics: *A1, …, An (*p (r1* … * rm) ) (1) • Query: SELECT DISTINCT a1, ….., an FROM r1, ….., rm WHERE p What is the only operator to change in (1)? • Semantics: A1, …, An (*p (r1* … * rm) ) (2) CMSC424, Spring 2005
Next: More SQL CMSC424, Spring 2005
More SQL: AS • Using AS in FROM clause • Introduces tuple variables e.g: SELECT DISTINCT T.bname FROM branch AS T, branch AS S WHERE T.assets > S.assets returns branch names of branches with non-minimal assets • Using AS in SELECT clause • Renames columns in result (p) e.g: SELECT bname, acct_no, balance * 1.05 AS newbal FROM account • returns: CMSC424, Spring 2005
More SQL: INTO • Used to name query results () e.g: SELECT DISTINCT bname INTO branchnames FROM branch • Intuitively: BranchNames SELECT DISTINCT bname FROM branch CMSC424, Spring 2005
More SQL: Order by • Example: List in alphabetical order the names of all customers with loans at the Perry branch SELECT DISTINCT cname FROM borrower AS b, loan AS l WHERE b.lno = l.lno AND bname = “Perry” ORDER BY cname Result = default: ascending order (asc) Can also write: ORDER BY cname DESC, or ORDER BY cname ASC Like SELECT DISTINCT, very expensive • requires external sort • cannot (usually) fit entire relation in memory. instead must sort in chunks. CMSC424, Spring 2005
More SQL: Aggregate Operators • Aggregate operators: AVG (col): average of values in col SUM (col): sum of values in col COUNT (col): number of values in col MIN (col): minimun value in col MAX (col): maximun value in col • Examples: • Find the average acct balance @ Perry SELECT AVG (bal) FROM account WHERE bname = “Perry” • Find the number of tuples in customer SELECT COUNT (*) FROM customer • Find the number of unique depositors SELECT COUNT (DISTINCT cname) FROM customer CMSC424, Spring 2005
More SQL: Aggregates & Group By • Usually, aggregates used with “GROUP BY” • Example: SELECT bname, COUNT (DISTINCT cname) FROM depositor AS d, account AS a WHERE d.acct_no = a.acct_no GROUP BY bname Extended relational grouping operator: G1, G2, …, GngF1( A1), F2( A2),…, Fn( An)(E) • Result: CMSC424, Spring 2005
More SQL: Aggregates & Group By • Intuition behind “Group By” SELECT bname, COUNT (DISTINCT cname) FROM depositor AS d, account AS a WHERE d.acct_no = a.acct_no GROUP BY bname Step 1: “Group “ result of join Step 2: Aggregate on groups and project on result CMSC424, Spring 2005
More SQL: GROUP BY (cont.) • Another Example: SELECT bname, SUM(assets) as total FROM branch GROUP BY bcity • Result? Multiple names to choose from • Above Query Not Allowed! CMSC424, Spring 2005
More SQL: GROUP BY (cont.) • Another Example: SELECT bname, SUM(assets) as total FROM branch GROUP BY bcity • Above Query Not Allowed! Non-aggregated attributes in SELECT clause (e.g., bname) must also appear in GROUP BY clause SELECT A1, ..., Ak, Agg1(), ...., Aggi() FROM .......... WHERE ............ GROUP BY A1, ..., Ak CMSC424, Spring 2005
More SQL: Having • WHERE :: FROM as HAVING :: GROUP BY • HAVING P: selects rows from result of GROUP BY • Optional (missing HAVING clause = HAVING TRUE) Example: Find names of branches and the average account balance for those branches having an account balance > $1200 SELECT bname, AVG(balance) AS avgFROM accountGROUP BY bnameHAVING avg > 1200 same result as SELECT bname, AVG(balance) AS avg INTO temp FROM account GROUP BY bname SELECT *FROM temp WHERE avg > 1200 + CMSC424, Spring 2005
More SQL: Set/Bag Operations • Set Operations • UNION ≡ U • INTERSECT ≡ ∩ • EXCEPT ≡ - • Bag Operations • UNION ALL ≡ U* • INTERSECT ALL ≡ ∩* • EXCEPT ALL ≡ -* • Duplicate Counting: • Given m copies of t in r, n copies of t in s, how many copies of t in: r UNION ALL s? A: m + n r INTERSECT ALL s? A: min (m, n) r EXCEPT ALL s? A: max (0, m-n) CMSC424, Spring 2005
More SQL: Set/Bag Operations • Example Queries: (SELECT cname FROM depositor) ? (SELECT cname FROM borrower) • ? ≡ UNION returns names of customers with savings accts, loans or both • ? ≡ INTERSECT returns names of customers with savings accts and loans • ? ≡ EXCEPT returns names of customers with savings accts but not loans CMSC424, Spring 2005
SQL: Summary Thus Far CMSC424, Spring 2005
SQL: Summary Thus Far • A kitchen sink query: SELECT bcity, sum(balance) AS totalbalance INTO BranchAcctSummary FROM branch AS b, account AS a WHERE b.bname = a.bname AND assets >= 1M GROUP BY bcity HAVING totalbalance > 700 ORDER BY bcity DESC • Steps 1,2 : FROM, WHERE CMSC424, Spring 2005
SQL: Summary Thus Far • Steps 3,4: GROUP BY, SELECT • Step 6: ORDER BY • Step 5: HAVING • Step 7: INTO • ... CMSC424, Spring 2005
Next: NULLs CMSC424, Spring 2005
More SQL: Nulls • The “dirty little secret” of SQL (major headache for query optimization) • Can be a value of any attribute e.g: branch = • What does this mean? We don’t know Waltham’s assets? Waltham has no assets? …. (Many possible interpretations) CMSC424, Spring 2005
More SQL: Nulls • Arithmetic Operations with Null n + NULL = NULL (similarly for all arithmetic ops: +, -, *, /, mod, …) e.g: branch = SELECT bname, assets * 2 as a2 FROM branch = CMSC424, Spring 2005
More SQL: Nulls • Boolean Operations with Null n < NULL = UNKNOWN (similarly for all boolean ops: >, <=, >=, <>, =, …) e.g: branch = = SELECT * FROM branch WHERE assets = NULL CMSC424, Spring 2005
More SQL: Nulls • Boolean Operations with Null n < NULL = UNKNOWN (similarly for all boolean ops: >, <=, >=, <>, =, …) e.g: branch = = SELECT * FROM branch WHERE assets IS NULL CMSC424, Spring 2005
More SQL: Unknown • Boolean Operations with Unknown n < NULL = UNKNOWN (similarly for all boolean ops: >, <=, >=, <>, =, …) FALSE OR UNKNOWN = UNKNOWN TRUE AND UNKNOWN = UNKNOWN Intuition: substitute each of TRUE, FALSE for unknown. If different answer results, results is unknown Can write: SELECT … FROM … WHERE booleanexp IS UNKNOWN UNKNOWN OR UNKNOWN = UNKNOWN UNKNOWN AND UNKNOWN = UNKNOWN NOT (UNKNOWN) = UNKNOWN CMSC424, Spring 2005
More SQL: Nulls • Given branch = • Aggregate Operations SELECT SUM (assets) = FROM branch NULL is ignored Same for AVG (3.7M), MIN (0.4M), MAX (9M) But COUNT (assets) returns CMSC424, Spring 2005
More SQL: Nulls • Given branch = SELECT SUM (assets) = FROM branch • Same as AVG, MIN, MAX • But COUNT (assets) returns CMSC424, Spring 2005
More and More… • Nested Subqueries • Derived Relations • Views • Modification of the Database • Joined Relations • Data Definition Language • Embedded SQL, ODBC and JDBC • We will discuss some of these things in class… rest through the SQL assignment CMSC424, Spring 2005
Nested Subqueries: Example • Find all customers who have both an account and a loan at the bank. select distinct customer-namefrom borrowerwhere customer-name in (select customer-namefromdepositor) Find all customers who have a loan at the bank but do not have an account at the bank select distinct customer-namefrom borrowerwhere customer-name not in (select customer-namefrom depositor) CMSC424, Spring 2005
Example Query • Find all customers who have both an account and a loan at the Perryridge branch select distinctcustomer-namefrom borrower, loanwhere borrower.loan-number = loan.loan-number andbranch-name = “Perryridge” and(branch-name, customer-name) in (select branch-name, customer-namefrom depositor, accountwhere depositor.account-number = account.account-number) Note: Above query can be written in a much simpler manner. The formulation above is simply to illustrate SQL features. CMSC424, Spring 2005
Set Comparison • Find all branches that have greater assets than some branch located in Brooklyn. select distinct T.branch-namefrom branch as T, branch as Swhere T.assets > S.assets andS.branch-city = ‘Brooklyn’ Same query using > some clause select branch-namefrom branchwhere assets > some (select assetsfrom branchwhere branch-city = ‘Brooklyn’) CMSC424, Spring 2005
0 5 6 Definition of Some Clause • F <comp> some r t r s.t. (F <comp> t)Where <comp> can be: (5< some ) = true (read: 5 < some tuple in the relation) 0 ) = false (5< some 5 0 ) = true (5 = some 5 0 (5 some ) = true (since 0 5) 5 (= some) in However, ( some) not in CMSC424, Spring 2005
0 5 6 Definition of all Clause • F <comp> all r t r (F <comp> t) (5< all ) = false 6 ) = true (5< all 10 4 ) = false (5 = all 5 4 (5 all ) = true (since 5 4 and 5 6) 6 (all) not in However, (= all) in CMSC424, Spring 2005
Example Query • Find the names of all branches that have greater assets than all branches located in Brooklyn. select branch-namefrom branchwhere assets > all (select assetsfrom branchwhere branch-city = ‘Brooklyn’) CMSC424, Spring 2005
Test for Empty Relations • The exists construct returns the value true if the argument subquery is nonempty. • exists r r Ø • not exists r r = Ø CMSC424, Spring 2005
Example Query • Find all customers who have an account at all branches located in Brooklyn. select distinct S.customer-namefrom depositor as Swhere not exists ( (select branch-namefrom branchwhere branch-city = ‘Brooklyn’)except (select R.branch-namefrom depositor as T, account as Rwhere T.account-number = R.account-number andS.customer-name = T.customer-name)) Note that X – Y = Ø X Y Note: Cannot write this query using= alland its variants CMSC424, Spring 2005
Test for Absence of Duplicate Tuples • The unique construct tests whether a subquery has any duplicate tuples in its result. • Find all customers who have at most one account at the Perryridge branch. • select T.customer-name • from depositor as T • where unique ( • select R.customer-namefrom account, depositor as Rwhere T.customer-name = R.customer-name andR.account-number = account.account-number andaccount.branch-name = ‘Perryridge’) CMSC424, Spring 2005
Example Query • Find all customers who have at least two accounts at the Perryridge branch. select distinct T.customer-name from depositor T where not unique ( select R.customer-name from account, depositor as R where T.customer-name = R.customer-name and R.account-number = account.account-number and account.branch-name = ‘Perryridge’) CMSC424, Spring 2005
Views • Provide a mechanism to hide certain data from the view of certain users. To create a view we use the command: create view v as<query expression> where: <query expression> is any legal expression The view name is represented by v CMSC424, Spring 2005
Example Queries • A view consisting of branches and their customers create view all-customer as(select branch-name, customer-namefrom depositor, accountwhere depositor.account-number = account.account-number) union(select branch-name, customer-namefrom borrower, loanwhere borrower.loan-number = loan.loan-number) • Find all customers of the Perryridge branch select customer-namefrom all-customerwhere branch-name = ‘Perryridge’ CMSC424, Spring 2005
Derived Relations • Find the average account balance of those branches where the average account balance is greater than $1200. • select branch-name, avg-balancefrom (select branch-name, avg (balance)from accountgroup by branch-name)as result (branch-name, avg-balance)where avg-balance > 1200 • Note that we do not need to use the having clause, since we compute the temporary (view) relation result in the from clause, and the attributes of result can be used directly in the where clause. CMSC424, Spring 2005
Modification of the Database – Deletion • Delete all account records at the Perryridge branch • delete from accountwhere branch-name = ‘Perryridge’ • Delete all accounts at every branch located in Needham city. • delete from accountwhere branch-name in (select branch-namefrom branchwhere branch-city = ‘Needham’)delete fromdepositorwhere account-number in (select account-numberfrom branch, accountwhere branch-city = ‘Needham’and branch.branch-name = account.branch-name) CMSC424, Spring 2005
Example Query • Delete the record of all accounts with balances below the average at the bank. delete from accountwhere balance < (select avg (balance)from account) Problem: as we delete tuples from deposit, the average balance changes Solution used in SQL: • First, compute avg balance and find all tuples to delete • Next, delete all tuples found above (without recomputing avg or retesting the tuples) CMSC424, Spring 2005