1.42k likes | 1.61k Views
CS411 Database Systems. 06: SQL π ≤ ≠ = ρ ≥ ⋈ ⋉⋊. Officially, SQL = S tructured Q uery L anguage (but that’s not what it really stands for). Many standards: SQL92, SQL2, SQL3, SQL99 Vendors support various subsets of these, but all of what we’ll be talking about.
E N D
CS411Database Systems 06: SQL π ≤ ≠ = ρ≥⋈ ⋉⋊
Officially, SQL = Structured Query Language (but that’s not what it really stands for) Many standards: SQL92, SQL2, SQL3, SQL99 Vendors support various subsets of these, but all of what we’ll be talking about.
What is special about SQL? You describe what you want, and the job of the DBMS is to figure out how to compute what you want efficiently. (at least in theory)
The basic form of a SQL query is select-from-where SELECT desired attributes FROM one or more tables WHERE condition on the rows of the tables Project out everything not in the final answer Every table you want to join, union, or intersect together All the join and selection conditions
Schema for our running example Beers(name, manf) Bars(name, addr, license) Drinkers(name, addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar)
What beers are made by Anheuser-Busch? Beers(name, manf) SELECTnameFROM BeersWHEREmanf = ‘Anheuser-Busch’; In relational algebra with bags:Beers[manf = “Anheuser-Busch”]
These simple queries can be translated to relational algebra (with bags) SELECTA1, …, An FROMR WHEREcondition R[condition][A1, …, An] π[A1, …, An][condition] R • Begin with the relation in the FROM clause. • Apply the selection indicated by the WHERE clause. • Apply the projection indicated by the SELECT clause.
Here is a way to think about how the query might be implemented • Imagine a tuple variable ranging over each tuple of the relation mentioned in FROM. • Check if the “current” tuple satisfies the WHERE clause. • If so, output the attributes/expressions of the SELECT clause using the components of this tuple.
Put * in the SELECT clause if you don’t want to project out any attributes Beers(name, manf) SELECT*FROM BeersWHEREmanf = ‘Anheuser-Busch’;
Find all US companies whose stock is > 50 Company(sticker, name, country, stockPrice) SELECT *FROM CompanyWHERE country=“USA” ANDstockPrice > 50
You can rename the attributes in the result, using “as <new name>” Beers(name, manf) SELECTname AS beer, manfFROM BeersWHEREmanf = ‘Anheuser-Busch’;
You can use math in the SELECT clause Sells(bar, beer, price) Case-insensitive, except inside quoted strings SELECTbar, bEeR, price*120 AS priceInYenFROM Sells;
You can create a new column and give it a constant value, in the SELECT clause SELECTdrinker, ‘Likes Bud’ AS WhoLikesBudFROM LikesWHEREbeer = ‘Bud’;
Find the price Joe’s Bar charges for Bud. Sells(bar, beer, price) SELECT priceFROM SellsWHERE bar = ‘Joe’’s Bar’ AND beer = ‘Bud’; Two single quotes inside a string = one apostrophe
What you can use in the WHERE clause conditions: constants of any supported type attribute names of the relation(s) used in the FROM arithmetic operations: stockprice*2 operations on strings (e.g., “||” for concatenation) comparisonoperators: =, <>, <, >, <=, >= lexicographic order on strings (<) string pattern matching: s LIKE p special functions for comparing dates and times and combinations of the above using AND, OR, NOT, and parentheses
attr LIKEpattern does pattern matching on strings pattern is a quoted string that may contain two special symbols: phone LIKE ‘%555-_ _ _ _’ address LIKE “%Mountain%”
Find the drinkers with phone prefix 555 Drinkers(name, addr, phone) SELECTnameFROM DrinkersWHEREphone LIKE ‘%555-____’;
Find all US companies whose address contains “Mountain” Company(sticker, name, address, country, stockPrice) SELECT *FROM CompanyWHERE country=“USA” ANDaddress LIKE “%Mountain%”;
What if an attribute value is unknown, or the attribute is inapplicable (e.g., my daughter’s spouse)? SELECTbarFROMSellsWHEREprice < 2.00 OR price >= 2.00; Why???
Conditions involving NULL evaluate to unknown, rather than true or false A tuple only goes in the answer if its truth value for the WHERE clause is true.
The “law of the excluded middle” doesn’t hold in this 3-valued logic unknown SELECTbarFROMSellsWHEREprice < 2.00 OR price >= 2.00; unknown unknown
SQL code writers spend a lot of space dealing with NULL values Can test for NULL explicitly: xIS NULL xIS NOT NULL SELECT * FROM Person WHERE age < 25 OR age >= 25 OR age IS NULL The answer includes all Persons!
If you need to join several relations, you can list them all in the FROM clause List the bars that serve a beer that Marianne likes. Likes(drinker, beer) Sells(bar, beer, price) SELECTbarFROM Sells, LikesWHEREdrinker = ‘Marianne’ AND Likes.beer = Drinker.beer; This is how we disambiguate attribute names. (Sells ⋈ Likes[drinker =“Marianne”])[bar]
Find the beers liked by at least one person who frequents Murphy’s Pub Likes(drinker, beer) Frequents(drinker, bar) SELECTbeer AS beersWorthKeepingFROM Likes, FrequentsWHEREbar = ‘Murphy’’s Pub’ AND Frequents.drinker = Likes.drinker; (Likes ⋈ Frequents[bar = “Murphy’s Pub”])[beer]
Find names of people living in Champaign who bought snow shovels, and the names of the stores where they bought them Purchase (buyer, seller, store, product) Person(pname, phoneNumber, city) SELECTpname, storeFROM Person, PurchaseWHERE pname = buyer AND city = ‘Champaign’ AND product = ‘snow shovel’; (Person [city = “Champaign”] ⋈Pname = buyer Purchase[product=“snow shovel”])[pname, store]
You can also join three or more relations, just like in relational algebra Find names of people buying telephony products. Product (name, price, category, maker) Purchase (buyer, seller, store, product) Person (name, phoneNumber, city) SELECT Person.name FROM Person, Purchase, Product WHERE Person.name=Purchase.buyer AND Purchase.product=Product.name AND Product.category=“telephony”
What should be in the answer when the query involves a join? • Create the cartesian product of all the relations in the FROM clause. • Then remove all the tuples that don’t satisfy the selection condition in the WHERE clause. • Project the remaining tuples onto the list of attributes/expressions in the SELECT clause.
An algorithm for computing the answer • Imagine one tuple variable for each relation mentioned in FROM.These tuple-variables visit each combination of tuples, one from each relation. • Whenever the tuple-variables are pointing to tuples that satisfy the WHERE clause, send these tuples to the SELECT clause. 30
What if a query needs to use two copies of the same relation? Beers(name, manf) Find all pairs of beers by the same manufacturer. • Do not produce pairs like (Bud, Bud). • Produce pairs in alphabetic order, e.g. (Bud, Miller), not (Miller, Bud). SELECTb1.name, b2.name FROMBeers b1, Beers b2 WHERE b1.manf = b2.manf AND b1.name < b2.name Beers1(name1, manf1) = Beers. Then (almost): (Beers ⋈manf = manf1 & name ≠name1 Beers2])[name, name1]
Almost identical: find pairs of companies making products in the same category Product (name, price, category, maker) SELECT product1.maker, product2.maker FROM Product AS product1, Product AS product2 WHERE product1.category=product2.category AND product1.maker > product2.maker
Meaning (Semantics) of SQL Queries SELECT A1, …, Ak FROM R1 AS x1, …, Rn AS xn WHERE conditions 1. Nested loops: Answer = {} forx1 in R1 do ….. forxninRndo ifconditions then Answer = Answer {(A1,…,Ak)} return Answer
Meaning (Semantics) of SQL Queries SELECT A1, …, Ak FROM R1 AS x1, …, Rn AS xn WHERE conditions 2. Parallel assignment Doesn’t impose any order! Answer = {} for all assignments x1 in R1, …, xninRndo ifconditionsthen Answer = Answer {(A1,…,Ak)} return Answer
Meaning (Semantics) of SQL Queries SELECT A1, …, Ak FROM R1 AS x1, …, Rn AS xn WHERE conditions 3. Translation to relational algebra: • A1,…,Ak ( sconditions (R1 … Rn)) Select-From-Where queries are relational algebra’s Select-Project-Join queries
Union, Intersection, and Difference ( query) UNION ( query ) ( query ) INTERSECT ( query ) ( query ) EXCEPT ( query ) List all bars and restaurants. (SELECT name FROMBars) UNION (SELECT name FROM Restaurants);
Because the result of a query is always a relation, you can query the result of a query List all restaurants that opened in 2000. SELECT name FROM (SELECT name, yearOpened FROM Restaurants) R WHERE R.yearOpened = 2000; subquery
From Sells(bar, beer, price), find the bars that serve Miller for the same price Joe charges for Bud. Two queries would surely work (if we save the intermediate results): • Find the price Joe charges for Bud. • Find the bars that serve Miller at that price.
If a subquery is guaranteed to produce one tuple, then the subquery can be used like an ordinary value SELECT bar FROM Sells WHERE beer = ‘Miller’ AND price = (SELECT price FROM Sells WHERE bar = ‘Joe’’s Bar’ AND beer = ‘Bud’); The price at which Joe sells Bud
But be careful when you use subqueries as values! • Usually the tuple has one attribute. • You’d better be sure only one tuple will be returned (e.g., keys guarantee it). • A run-time error occurs if there is no tuple or more than one tuple.
The IN and NOT IN operators allow you to test whether a value or tuple is in a relation Beers(name, manf) Likes(drinker, beer) Find the name and manufacturer of each beer that Fred likes. SELECT * FROM Beers WHERE name IN (SELECT beer FROM Likes WHERE drinker = ‘Fred’); The set of beers Fred likes
Notice the SQL “not equals” operator You can test whether a relation is empty, using EXISTS and NOT EXISTS SELECT name FROM Beers b1 WHERE NOT EXISTS( SELECT * FROM Beers WHERE manf = b1.manf AND name <> b1.name); Beers(name, manf)Find the beers that are the only one made by their manufacturer Notice scope rule: manf refers to closest nested FROM with a relation having that attribute. Set of beers with the same manf as b1, but not the same beer
Use ANY/ALL to check whether a condition is true at least once/always x = ANY( subquery) x >= ANY( subquery) x = ALL( subquery) x >= ALL( subquery) true if x is in the (single column) answer to subquery true if x is as big as one or more answers to subquery true if x is the only answer to subquery true if x is as big as all the answers to subquery
How many of these can you do so far? Who has been diagnosed with a disease: Diag (Pat, Dis) Who has had which tests: Outcome (Pat, Test, Result) Which tests diagnose which diseases: ToDiag (Dis, Test) • Who has which disease? • Who has a disease they have been tested for? • Who has a disease they tested positively for? • Who has a disease that they tested both positively & negatively for? • Who tested both positively and negatively for a disease, whether or not they have it? • Who tested both positively and negatively for the same test? • What testable disease does no one have? • What disease does more than one person have? • What diseases does everyone have?
Find the beer(s) sold for the highest price Sells(bar, beer, price SELECT beer FROM Sells WHERE price >= ALL( SELECT price FROM Sells); price from the outer Sells must not be less than any price.
Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar) Find the drinkers and beers such that: • The drinker likes the beer, and • The drinker frequents at least one bar that sells the beer.
The drinker frequents a bar that sells the beer. (SELECT * FROM Likes) INTERSECT (SELECT drinker, beer FROM Sells, Frequents WHERE Frequents.bar = Sells.bar );
Relational Algebra: Operations on Bags (and why we care) • Union: {a,b,b,c} U {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f} • add the number of occurrences • Difference: {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} • subtract the number of occurrences • Intersection: {a,b,b,b,c,c} {b,b,c,c,c,c,d} = {b,b,c,c} • minimum of the two numbers of occurrences • Selection: preserve the number of occurrences • Projection: preserve the number of occurrences (no duplicate elimination) • Cartesian product, join: no duplicate elimination Read Section 5.3 of the book for more detail
The SELECT-FROM-WHERE statement uses bag (not set) semantics That means that while processing the query, the DBMS never gets rid of duplicates… except…