390 likes | 521 Views
C20.0046: Database Management Systems Lecture #10. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Agenda. Last time: R.A., Bags This time: Finish R.A. Begin SQL Project Part 2 due now Something else assigned soon. Relational Algebra Review. Five basic operators:
E N D
C20.0046: Database Management SystemsLecture #10 Matthew P. Johnson Stern School of Business, NYU Spring, 2004 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Agenda • Last time: R.A., Bags • This time: • Finish R.A. • Begin SQL • Project Part 2 due now • Something else assigned soon M.P. Johnson, DBMS, Stern/NYU, Sp2004
Relational Algebra Review • Five basic operators: • Union: Intersection: Difference: - • Selection: s • Projection: P • Cartesian Product: • Extended operators: • Joins (equijoin, theta join, semijoin, outerjoin) • Renaming: r • Extended projection P • Sorting t • Grouping-and-aggregation op g M.P. Johnson, DBMS, Stern/NYU, Sp2004
Sorting • So far, everything’s an unordered bag • But sometimes order is nice • Sort op tL(R) produces a list, not a bag • No operators operate on lists • if sort called, generally last op • Subscript L = a1,a2,… in op is the list of attributes to sort on • Rows sorted by attributes • Rows with same a1 value sorted by a2, etc. M.P. Johnson, DBMS, Stern/NYU, Sp2004
Outerjoin • Like L ⋈ R except that dangling tuples are included, padded with nulls • Left outerjoin: dangling tuples from L are include • Nulls appear “on the right” • Right outerjoin: dangling tuples from R are included • Nulls appear “on the left” M.P. Johnson, DBMS, Stern/NYU, Sp2004
Constraints on Relations (5.5) • Ref. integ., FDs, other constraints are expressible in RA • Two basic tools: • R = Æ • Assert R is empty • R Í S • Assert R is a subset of S • NB: They’re equivalent • R Í S iff R – S = Æ • R = Æ iff R Í S-S (for arbitrary S) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Expressing referential integrity • Relations: • Reps(ssn, name, etc.) • Clients(ssn, name, rssn) • Suppose we require: each client gets a sales rep • a client’s row contains an rssn must have a rep with that ssn • How to require this in RA? • Every Clients.rssn must be in the set of Reps.ssns • Prssn(Clients) ÍPssn(Reps) • Or: Prssn(Clients) – Pssn(Reps) = Æ M.P. Johnson, DBMS, Stern/NYU, Sp2004
Expressing referential integrity • Also works for multiple attributes • Relations: • StarsIn(SName,Title,Year) • Movies(Title, Year, Length, Studio) • Require: every movie referenced by StarsIn to exist • Write: PTitle,Year(StarsIn) ÍPTitle,Year(Movies) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Expressing FDs • Relation: • Employees(name,ssn,address,gender, etc.) • Has FD: ssnaddress • What does the FD mean? • No matter how we choose two rows, if they agree on ssn, then they agree on address • So, strategy: choose pairs all possible ways; • Select pairs that agree on ssn but not address; • Check how many we get • First, rename one copy to E1 and one to E2 • rE1(Employees), rE2(Employees) • Then: sE1.ssn=E2.ssn AND E1.address != E2.address(E1 x E2) = Æ M.P. Johnson, DBMS, Stern/NYU, Sp2004
Expressing domain constraints • Constraint on legal values for attributes • Employees(name,ssn,address,gender etc.) • Gender should be M/F • Select bad ones and check count • sgender!=‘F’ AND gender!=‘M’(Employees) = Æ M.P. Johnson, DBMS, Stern/NYU, Sp2004
Expressing other constraints • Relations: • MovieExecs(name, address, ssn, netWorth) • Studios(name, address, presSsn) • Constraint: Studio presidents must be worth at leat $10,000,000 • First, theta-join presSsn to ssn, then select ones w/ < $10M, then check count: • snetWorth<10000000(Studio ⋈presSsn=ssn MEs) = Æ • Or: Select MEs w/ >= $10M, then check that they contain all studio presidents: • PpresSsn(Studios) Í Pssn(snetWorth<10000000(MEs)) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Recap: You are here • First part of course is done: conceptual foundations • You now know: • E/R Model • Relational Model • Relational Algebra • You now know how to: • Capture part of world as an E/R model • Convert E/R models to relational models • Convert relational models to good (normal) forms • Express queries in relational algebra • Next: • Create, update, query SQL tables • Write SQL/DB-connected applications M.P. Johnson, DBMS, Stern/NYU, Sp2004
Next topic: SQL (6.1) • Standard language for querying and manipulating data Structured Query Language • Many standards: ANSI SQL, SQL92/SQL2, SQL3/SQL99 • Vendors support various subsets/extensions • We’ll do SQL99/Oracle • Basic form (many more bells and whistles in addition): SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Data Types in SQL • Characters: • CHAR(20) -- fixed length • VARCHAR(40) -- variable length • Numbers: • BIGINT, INT, SMALLINT, TINYINT • REAL, FLOAT -- differ in precision • MONEY • Times and dates: • DATE • DATETIME -- SQL Server M.P. Johnson, DBMS, Stern/NYU, Sp2004
Table name “Tables” Attribute names Product Tuples or rows M.P. Johnson, DBMS, Stern/NYU, Sp2004
Simple SQL Query Product SELECT *FROM ProductWHERE category=‘Gadgets’ “selection” M.P. Johnson, DBMS, Stern/NYU, Sp2004
Simple SQL Query Product SELECT PName, Price, ManufacturerFROM ProductWHERE Price > 100 “selection” and “projection” M.P. Johnson, DBMS, Stern/NYU, Sp2004
A Notation for SQL Queries Input Schema Product(PName, Price, Category, Manfacturer) SELECT Name, Price, ManufacturerFROM ProductWHERE Price > 100 Answer(PName, Price, Manfacturer) Output Schema M.P. Johnson, DBMS, Stern/NYU, Sp2004
R.A. SQL • R.A. Projection P SQL SELECT • R.A. Selection s SQL WHERE • R.A. Join SQL FROM • Comma-separated list… • What goes in the WHERE clause: • x = y, x < y, x <= y, etc. • For number, they have the usual meanings • For CHAR and VARCHAR: lexicographic ordering • Expected conversion between CHAR and VARCHAR • For dates and times, what you expect M.P. Johnson, DBMS, Stern/NYU, Sp2004
R.A. SQL • Movies(Title,Year,Length,inColor,Studio,Prdcr#) • Q: How long was Star Wars (1977), in R.A.? • Q: In SQL? • Q: Which Fox movies are are at least 100 minutes long, in R.A.? • Q: In SQL? M.P. Johnson, DBMS, Stern/NYU, Sp2004
R.A. SQL • Reps(ssn, name, etc.) • Clients(ssn, name, rssn) • Q: Who are George’s clients, in R.A.? • Second answer from last time: • PClients.name(sReps.name=“George” and Reps.ssn=rssn(Reps x Clients)) • In SQL? M.P. Johnson, DBMS, Stern/NYU, Sp2004
The LIKE operator • s LIKE p: pattern matching on strings • p may contain two special symbols: • _ = any single character • % = zero or more chars • Product(Name, Price, Category, Manufacturer) • Find all products whose name contains ‘gizmo’: SELECT *FROM ProductsWHERE PName LIKE ‘%gizmo%’ M.P. Johnson, DBMS, Stern/NYU, Sp2004
The LIKE operator • Q: What it want to search for values containing a ‘%’? • PName LIKE ‘%%%’ won’t work • Instead, must use escape chars • In C/C++/J, prepend ‘\’ • In SQL, prepend an arbitrary escape char: • PName LIKE ‘x%%x%’ ESCAPE ‘x’ M.P. Johnson, DBMS, Stern/NYU, Sp2004
Eliminating Duplicates SELECT category FROM Product Compare to: SELECTDISTINCT category FROM Product M.P. Johnson, DBMS, Stern/NYU, Sp2004
Ordering the Results • Ordering is ascending, unless you specify the DESC keyword per attribute. • Ordering is ascending, unless you specify the DESC keyword per attribute. SELECT pname, price, manufacturer FROM Product WHERE category=‘gizmo’ AND price > 50 ORDER BY price, pname SELECT pname, price, manufacturer FROM Product WHERE category=‘gizmo’ AND price > 50 ORDER BY price DESC, pname ASC M.P. Johnson, DBMS, Stern/NYU, Sp2004
Ordering the Results SELECT Category FROM Product ORDER BY PName ? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Ordering the Results SELECT DISTINCT category FROM Product ORDER BY category Compare to: ? SELECT DISTINCT category FROM Product ORDER BY PName M.P. Johnson, DBMS, Stern/NYU, Sp2004
Joins in SQL (6.2) • Connect two or more tables: Product Company What is the connection between them? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Joinbetween Productand Company Joins in SQL Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all products under $200 manufactured in Japan;return their names and prices. SELECT PName, PriceFROM Product, CompanyWHEREManufacturer=CName AND Country=‘Japan’ AND Price <= 200 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Joins in SQL Product Company SELECT PName, PriceFROM Product, CompanyWHEREManufacturer=CName AND Country=‘Japan’ AND Price <= 200 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Joins in SQL Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all countries that manufacture some product in the ‘Gadgets’ category. SELECT CountryFROM Product, CompanyWHEREManufacturer=CName AND Category=‘Gadgets’ M.P. Johnson, DBMS, Stern/NYU, Sp2004
Joins in SQL Product Company SELECT CountryFROM Product, CompanyWHEREManufacturer=CName AND Category=‘Gadgets’ What is the problem? What’s thesolution? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Joins Product (pname, price, category, manufacturer) Purchase (buyer, seller, store, product) Person(name, phone, city) Find names of Seattleites who bought Gadgets, and the names of the stores they bought such product from. SELECT DISTINCT name, storeFROM Person, Purchase, ProductWHEREpersname=buyer AND product = pname AND city=‘Seattle’ AND category=‘Gadgets’ M.P. Johnson, DBMS, Stern/NYU, Sp2004
Disambiguating Attributes • Sometimes two relations have the same attr:Person(pname, address, worksfor)Company(cname, address) Whichaddress ? SELECT DISTINCT pname, addressFROM Person, CompanyWHEREworksfor = cname SELECT DISTINCT Person.pname, Company.addressFROM Person, CompanyWHEREPerson.worksfor = Company.cname M.P. Johnson, DBMS, Stern/NYU, Sp2004
Tuple Variables Product (pname, price, category, manufacturer) Purchase (buyer, seller, store, product) Person(persname, phoneNumber, city) Find all stores that sold at least one product that the store‘BestBuy’ also sold: SELECT DISTINCTx.store FROM Purchase AS x, Purchase AS y WHERE x.product = y.product AND y.store = ‘BestBuy’ Answer (store) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Tuple Variables • Tuple variables introduced automatically: • Product ( name, price, category, manufacturer) • Becomes: • Doesn’t work when Product occurs more than once • In that case the user needs to define variables explicitly SELECTname FROMProduct WHEREprice > 100 SELECTProduct.name FROMProduct AS Product WHEREProduct.price > 100 M.P. Johnson, DBMS, Stern/NYU, Sp2004
SQL Query Semantics SELECTa1, a2, …, ak FROMR1 AS x1, R2 AS x2, …, Rn AS xn WHEREConditions 1. Nested loops: Answer = {} for x1 in R1 do for x2 in R2 do ….. for xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)} return Answer M.P. Johnson, DBMS, Stern/NYU, Sp2004
SQL Query Semantics SELECTa1, a2, …, ak FROMR1 AS x1, R2 AS x2, …, Rn AS xn WHEREConditions 2. Parallel assignment Doesn’t impose any order! Answer = {} for all assignments x1 in R1, …, xn in Rn do if Conditions then Answer = Answer {(a1,…,ak)} return Answer M.P. Johnson, DBMS, Stern/NYU, Sp2004
First Unintuitive SQLism • SELECT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A • Looking for R (S T) • But what happens if T is empty? • See transcript of this in Oracle on sales M.P. Johnson, DBMS, Stern/NYU, Sp2004