700 likes | 938 Views
OCL2 Oracle 10 g : SQL & PL/SQL Session #5. Matthew P. Johnson CISDD, CUNY January, 2005. Live examples. Examples from sqlzoo.net Q: produce a list of employees and their bosses What if no boss? Or no subordinate? Joins on emp , emp man : Comma-based Inner Natural Cross
E N D
OCL2 Oracle 10g:SQL & PL/SQLSession #5 Matthew P. Johnson CISDD, CUNY January, 2005 Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Live examples • Examples from sqlzoo.net • Q: produce a list of employees and their bosses • What if no boss? Or no subordinate? • Joins on emp, emp man: • Comma-based • Inner • Natural • Cross • Outer – left, right, full Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
More live examples • Inner joins require an ON clause • Like a where clause • Arbitrary boolean expression • If always true (1=1), reduces to cross join • New compar op: BETWEEN • a between 5 and 10 a >= 5 and a <= 10 • Q: produce a list of employees with their salary grades • emp, salgrade Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Null/logic review • TRUE AND UNKNOWN = ? • TRUE OR UNKNOWN = ? • UNKNOWN OR UNKNOWN = ? • X = NULL = ? Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
INTERSECT and EXCEPT If R, S have noduplicates, then canwrite withoutsubqueries(HOW?) (SELECT R.A, R.B FROM R)INTERSECT (SELECT S.A, S.B FROM S) SELECT R.A, R.B FROM RWHEREEXISTS(SELECT *FROM SWHERE R.A=S.A and R.B=S.B) (SELECT R.A, R.B FROM R)EXCEPT (SELECT S.A, S.B FROM S) SELECT R.A, R.B FROM RWHERENOTEXISTS(SELECT *FROM SWHERE R.A=S.A and R.B=S.B) Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Agenda • More SQL • Grouping & aggregation • Modifications • Defining schemata • Views Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Grouping & Aggregation • In SQL: • aggregation operators in SELECT, • Grouping in GROUP BY clause • Recall aggregation operators: • sum, avg, min, max, count • strings, numbers, dates • Each applies to scalars • Count also applies to row: count(*) • Can DISTINCT inside aggregation op: count(DISTINCT x) • Grouping: group rows that agree on single value • Each group becomes one row in result Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Straight aggregation • In R.A. Psum(x)total(R) • In SQL: • Just put the aggregation op in SELECT • NB: aggreg. ops applied to each non-null val • count(x) counts the number of nun-null vals in field x • Use count(*) to count the number of rows SELECT SUM(x) AS total FROM R Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Straight Aggregation example • COUNT applies to duplicates, unless otherwise stated: • Better: • Can we say: SELECT Count(category)FROM Product WHERE year > 1995 same as Count(*), except excludes nulls SELECT Count(DISTINCT category) FROM Product WHERE year > 1995 SELECT category, Count(category) FROM Product WHERE year > 1995 Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Straight Aggregation example • Purchase(product, date, price, quantity) • Q: Find total sales for the entire database: • Q: Find total sales of bagels: SELECT SUM(price * quantity) FROM Purchase SELECT SUM(price * quantity) FROM Purchase WHERE product = ‘bagel’ Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Straight grouping • Group rows together by field values • Produces one row for each group • I.e., by each (combin. of) grouped val(s) • Don’t select non-grouped fields • Reduces to DISTINCT selections: SELECT product FROM Purchase GROUP BY product SELECT DISTINCT product FROM Purchase Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Grouping & aggregation • Sometimes want to group and compute aggregations by group • Aggreg. op applied to rows in group, not all rows in table • Q: How many sales of each product? SELECT product, count(*) AS numSales FROM cia GROUP BY product Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Evaluation of G&A • Evaluation steps: • Compute the FROM-WHERE part as usual to obtain a table with all attributes in R1,…,Rn • Group by the attributes a1,…,ak • Compute the aggregates in C2 and keep only groups satisfying C2 • Compute aggregates in S and return the result SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
G & A for constructed relations • Can do the same thing for larger, non-atomic relations • Scenario: • Movie(title,year,producerSsn,length) • MovieExec(name,ssn,netWorth) • Q: How many mins. of film did each producer make? • What happens to non-producer movie-execs? SELECT name, sum(length) AS total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Illustrated G&A example • Usually want aggregations on certain parts of the relation • Purchase(product, date, price, quantity) • Find total sales after 10/18 per product that sold > $20: SELECT product, SUM(price*quantity) AS TotalSales FROM Purchase WHERE date > DATE ’10-18-2003’ GROUP BY product Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Illustrated G&A example Purchase Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Illustrated G&A example • First compute the FROM-WHERE clauses (date > DATE ’10-18-2003’) then GROUP BY product: Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Illustrated G&A example • Finally, aggregate: SELECT product, SUM(price*quantity) AS TotalSales FROM Purchase WHERE d date > DATE ’10-18-2003’ GROUP BY product Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Illustrated G&A example • GROUP BY may be reduced to (maybe more complicated) subquery SELECT product, Sum(price*quantity) AS TotalSales FROM Purchase WHERE date > DATE ’10-18-2003’ GROUP BY product SELECT DISTINCT x.product, (SELECT Sum(y.price*y.quantity)FROM Purchase yWHERE x.product = y.product AND y.date > DATE ’10-18-2003’)AS TotalSales FROM Purchase x WHERE x.date > DATE ’10-18-2003’ Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Another example For every product, what is the total sales and max quantity sold? SELECT product, Sum(price * quantity) AS SumSales Max(quantity) AS MaxQuantity FROM Purchase GROUP BY product Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Live group by e.g. • Q: How many people (immediate subordinates) does each manager manage? • Join emp with self… • If want non-managers, do outer join… Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
HAVING clauses • Sometimes we want to limit which tuples may be grouped • Q: How many mins. of film did each rich producer (i.e., netWorth > 10000000) make? • Q: Is HAVING necessary here? • A: No, could just add rich req. to SELECT SELECT name, sum(length) AS total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name HAVING netWorth > 10000000 Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
HAVING clauses • Sometimes we want to limit which tuples may be grouped, based on properties of the group • Q: How many mins. of film did each old producer (i.e., started before 1930) make? SELECT name, sum(length) AS total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name HAVING min(year) < 1930 Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
General form of G&A • NB: “Any attribute of relations in the FROM clause may be aggregated in the HAVING clause, but only those attributes that are in the GROUP BY list may appear unaggregated in the HAVING clause (the same rule as for the SELECT clause)” (Ullman, p283). S = may contain attributes As and/or any aggregates but no other attributes C1 = condition on the attributes in R1,…,Rn C2 = condition on aggregations or attributes from As SELECT S FROM R1,…,Rn WHERE C1 GROUP BY As HAVING C2 Why? Why? Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Live example • Q: produce list of bosses and underling-counts, for bosses with >1 underling • Just add HAVING clause… Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
More A&G Examples Web pages, and their authors: Author(login,name) Document(url, title) Wrote(login,url) Mentions(url,word) Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Web page examples • Find all authors who wrote at least 10 documents Author(login,name), Wrote(login,url) • Attempt 1: with nested queries Bad! SELECTDISTINCT Author.name FROM Author WHERE count(SELECT Wrote.urlFROM WroteWHERE Author.login=Wrote.login) > 10 Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Web page examples • Find all authors who wrote at least 10 documents: • Attempt 2: Simplify with GROUP BY SELECT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) > 10 Good! No need for DISTINCT: get for free from GROUP BY Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Web page examples • Find all authors who have a vocabulary over 10000 words: Author(login,name), Wrote(login,url), Mentions(url, world) SELECT Author.name FROM Author, Wrote, Mentions WHERE Author.login=Wrote.login AND Wrote.url=Mentions.url GROUP BY Author.name HAVING count(distinct Mentions.word) > 10000 Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Summary: SQL queries • Only SELECT, FROM required • Can’t have HAVING without GROUP BY • Can have GROUP BY without HAVING • Any clauses must appear in this order: SELECT LFROM Rs WHERE s GROUP BY L2 HAVING s2 ORDER BY L3 Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
New topic: Modifications • Three kinds of modifications • Insertions • Deletions • Updates • Sometimes “update” used as a synonym for “modification” Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Insertions General form: INSERT INTO R(A1,…., An) VALUES (v1,…., vn) Example: Insert a new purchase to the database: INSERT INTO Purchase(buyer, seller, product, store) VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’, ‘The Sharper Image’) Missing attribute NULL (or other default value) Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Insertions • If we’re sure we have all values in the right order, can just say: • Only do this if you’re sure of order in which the table fields were defined INSERT INTO R VALUES (v1,…., vn) INSERT INTO Purchase VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’, ‘The Sharper Image’) Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Insertions • Can insert the result of a query; Scenario: • Product(name, etc.) • Purchase(buyerssn, prodName, etc.) • Maybe some purchases name missing products • add those to the Product table • Subquery replaces VALUES INSERT INTO R(As) (query) Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Insertion example • Premise: data corruption lose some Product data • every product referred to in Purchase should exist in Product, but some are missing Product(name, listPrice, category) Purchase(prodName, buyerName, price) Product Purchase Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Insertion example INSERT INTO Product(name) SELECT prodName FROM Purchase WHERE prodName NOT IN (SELECT name FROM Product) Purchase Product Product’ Q: Or do we get: A: Depends on implementation! Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Deletions • General form: • Example: • Q: How do you delete just one row with SQL simpliciter? • A: You can’t! • Although Oracle has the ROWID pseudo-field… • As usual, WHERE can contain subqueries DELETE FROM Table WHERE condition DELETE FROM PURCHASE WHERE seller = ‘Joe’ AND product = ‘Brooklyn Bridge’ Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Updates • General form: • Example: • As usual, WHERE can contain subqueries UPDATE Product SET field1 = value1, field2 = value2 WHERE condition UPDATE Product SET price = price/2 WHERE Product.name IN (SELECT product FROM Purchase WHERE Date = DATE‘Oct, 25, 1999’); Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
New topic: Defining schemata • So far, have done queries and data manipulation • Now doing data definition • Recall data types: • INT or INTEGER (variant: SHORTINT) • FLOAT or REAL: floating-point numbers • DOUBLE PRECISION: • DECIMAL(n,d): • E.g. decimal(5,2): five decimal digits, with the decimal point two positions from the right: e.g. 123.45 • DATE and TIME • Character strings • Fixed length: CHAR(n) • Variable length: VARCHAR(n) Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Creating tables • Form: • Example: CREATE TABLE Table-name ( field field-type, field field-type, … field field-type ) No comma! CREATE TABLE People ( name VARCHAR(30), ssn CHAR(9), age SHORTINT, city VARCHAR(30), gender BIT(1), Birthdate DATE ) Not supported in Oracle Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Default Values • Specify defaults when creating table: • The default default: NULL CREATE TABLE People ( name VARCHAR(30), ssn CHAR(9), age SHORTINT DEFAULT 100, city VARCHAR(30) DEFAULT ‘New York’, gender BIT(1), Birthdate DATE DEFAULT DATE ‘0000-00’00’, ) Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Deleting and modifying schemata • Delete data, indices, schema: • Delete data and indices: • Either way, use EXTREME CAUTION! • Add or delete attributes: DROP TABLE Person TRUNCATE TABLE Person Q: What’s put in the new fields? ALTER TABLE Person ADD phone CHAR(12); ALTER TABLE Person DROP age; Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
New topic: Indices • Veryimportant speeding up query processing • Index on field(s) = data structure that makes searches/comparisons on those fields fast • Suppose we have a relation • Person (name, age, city) • Sequential scan of the whole Person file may take a very long time SELECT * FROM Person WHERE name = ‘Waksal’ Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Creating Indices • Syntax: • Here: • No searching by name is much faster • How much faster? • Log-time, say • Base-what? Doesn’t matter, but say 2 • If all New Yorkers, #comparisons: 8000000 log2(8000000) ~= 23 • (i.e., 223 ~= 8000000) CREATE INDEX index-name ON R(field(s)) CREATE INDEX nameIndex ON Person(name) Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
How do indices work? • What the data structure? • Different possibilities • 1st intuition: index on field f is an ordered list of all values in the table’s f field • each item has address (“rowid”) of its row • Where do we get the ordered list? • 2nd intuition: put all f values in a BST • searching BST take log time (why?) • DBMSs actually use a variant: B+Tree • See Ullman’s book or data structures texts… Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Creating Indices • Indexes can be useful in range queries too: CREATE INDEX ageIndex ON Person (age) SELECT * FROM Person WHERE age > 25 Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Using indices • Indices can be created on multiple attributes: • Helps in: • And in: • But not in: CREATE INDEX doubleindex ON Person (age, city) SELECT * FROM Person WHERE age = 55 AND city = ‘Seattle’ Idea: our sorted list is sorted on age;city, not city;age SELECT * FROM Person WHERE age = 55 Q: In Movie tbl, should index be on year;title or title;year? SELECT * FROM Person WHERE city = ‘Seattle’ Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
The Index Selection Problem • Big Q: Why not just indexes all the fields? • how does the list/B+Tree stay up to date? • We are given a workload: a set of SQL queries and their frequencies • Q is: What indices should we build to speed up the workload? • A: • Attributes in FROM/WHERE clauses favor an index • Attributes in INSERT/UPDATE clauses discourage an index • In Oracle: your primary key fields get indexed automatically (why?) Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
New topic: Views • Stored relations physically exist and persist • Views are relations that don’t • in some texts, “table” = stored relation = “base table” • Basically names/references given to queries • maybe a relevant subset of a table • Employee(ssn, name, department, project, salary) • Payroll has access to Employee, others only to Developers CREATE VIEW Developers AS SELECT name, project FROM Employee WHERE department = “Development” Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
A Different View • Person(name, city) • Purchase(buyer, seller, product, store) • Product(name, maker, category) • We have a new virtual table: Seattle-view(buyer, seller, product, store) CREATE VIEW Seattle-view AS SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.name = Purchase.buyer Matthew P. Johnson, OCL2, CISDD CUNY, January 2005