950 likes | 1.12k Views
Lecture 3 : Introduction to SQL. September 29, 2014. Announcements. There are good videos. Watch the SQL video. There is an activity next class. SQL Challenge is Due 10/ 6. SQL Motivation. Dark times 5 years ago . Are databases dead? E veryone sells SQL Pig, Hive, Impala
E N D
Lecture 3:Introduction to SQL September 29, 2014
Announcements • There are good videos. • Watch the SQL video. • There is an activity next class. • SQL Challenge is Due 10/6
SQL Motivation • Dark times 5 years ago. • Are databases dead? • Everyone sells SQL • Pig, Hive, Impala • “Not-Yet-SQL?”
SQL Introduction Standard language for querying and manipulating data Structured Query Language • Many standards out there: • ANSI SQL, SQL92 (a.k.a. SQL2), SQL99 (a.k.a. SQL3), …. • Vendors support various subsets A very high-level programming language. This works because it is optimized well! NB: Probably the world’s most successful parallel programming language (multicore?)
SQL • Data Definition Language (DDL) • Create/alter/delete tables and their attributes • Data Manipulation Language (DML) • Insert/delete/modify tuples in tables (last time) • Query one or more tables – discussed next !
Table name Tables in SQL Attribute names Product Tuples or rows
Tables Explained • The schema of a table is the table name and its attributes: Product(PName, Price, Category, Manfacturer) • A key is an attribute whose values are unique;we underline a key Product(PName, Price, Category, Manfacturer)
Data Types in SQL • Atomic types: • Characters: CHAR(20), VARCHAR(50) • Numbers: INT, BIGINT, SMALLINT, FLOAT • Others: MONEY, DATETIME, … • Every attribute must have an atomic type • Hence tables are flat • Why ?
Tables Explained • A tuple = a record • Restriction: all attributes are of atomic type • A table = a (multi)-set of tuples • Like a list… • …but it is unordered: no first(), no next(), no last().
Outline • Datain SQL (Review) • Simple Queries in SQL • Queries with more than one relation
SQL Query Basic form: (there are many many more bells and whistles) SELECT <attributes> FROM <one or more relations> WHERE <conditions> Call this: An SFW query.
Simple SQL Query Product SELECT *FROM ProductWHERE category=‘Gadgets’ “selection”
Simple SQL Query Product SELECTPname, Price,ManufacturerFROM ProductWHERE category=‘Gadgets’ “selection” “projection”
Notation Input Schema Product(PName, Price, Category, Manfacturer) SELECTPName, Price, ManufacturerFROM ProductWHERE Price > 100 Answer(PName, Price, Manfacturer) Output Schema
Details • Case insensitive: • Same: SELECT Select select • Same: Product product • Different: ‘Seattle’ ‘seattle’ • Constants: • ‘abc’ - yes • “abc” - no
The LIKE operator • s LIKE p: pattern matching on strings • p may contain two special symbols: • % = any sequence of characters • _ = any single character SELECT *FROM ProductsWHERE PName LIKE ‘%gizmo%’
Eliminating Duplicates SELECTDISTINCT category FROM Product Compare to: SELECT category FROM Product
Ordering the Results SELECTpname, price, manufacturer FROM Product WHERE category=‘gizmo’ AND price > 50 ORDER BY price, pname Ties are broken by the second attribute on the ORDER BY list, etc. Ordering is ascending, unless you specify the DESC keyword.
? SELECTDISTINCT category FROM Product ORDER BY category ? SELECT Category FROM Product ORDER BYPName ? SELECTDISTINCT category FROM Product ORDER BYPName
Keys and Foreign Keys Company Key Foreignkey Product
Joinbetween Productand Company Joins Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all products under $200 manufactured in Japan;return their names and prices. SELECTPName, PriceFROM Product, CompanyWHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= 200
Joins Product Company SELECTPName, PriceFROM Product, CompanyWHERE Manuf=CName AND Country=‘Japan’ AND Price <= 200
More Joins Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all Chinese companies that manufacture products both in the ‘electronic’ and ‘toy’ categories SELECTcnameFROMWHERE
A Subtlety about Joins Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all countries that manufacture some product in the ‘Gadgets’ category. SELECT CountryFROM Product, CompanyWHERE Manufacturer=CName AND Category=‘Gadgets’
A subtlety about Joins Product Company SELECT CountryFROM Product, CompanyWHERE Manufacturer=CName AND Category=‘Gadgets’ What is the problem ? What’s the solution ?
SELECTDISTINCTPerson.pname, Company.addressFROM Person, CompanyWHERE Person.worksfor = Company.cname SELECTDISTINCTx.pname, y.addressFROM Person AS x, Company AS yWHERE x.worksfor = y.cname Tuple Variables Person(pname, address, worksfor)Company(cname, address) Whichaddress ? SELECTDISTINCTpname, addressFROM Person, CompanyWHERE worksfor = cname
SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions Meaning (Semantics) of SQL Queries Almost never the fastest way to do it! Answer = {} for x1in R1do for x2in R2do ….. forxninRndo if Conditions then Answer = Answer {(a1,…,ak)} return Answer WARNING: This is a multisetunion
An example of the semantics SELECTR.A FROM R, S WHERE R.A = S.B Cross Product Apply selections Where would DISTINCT be applied?
Computes R Ç (S È T) But what if S = f ? An Unintuitive Query SELECTDISTINCT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A What does it compute ?
Outline • Reprise: ORDER BY & Semantics • Set Operators and Nested Queries • Fancy SQL • Aggregation • NULL and Outer Joins • Host Languages & Cursors
Ordering SELET Name FROM Product ORDER BY Price SQL-89 says“This makes no sense!” Formally, the ordering should only be applied on the values returned (Remember the big for loop?) Intuitively, clear what it means: “Give me the products in descending order…” … still a little weird, products listed multiple times… Some RDBMSs allow you to do this.
Ordering again… SELET DISTINCT Name FROM Product ORDER BY Price SQL-89 said “This definitely makes no sense!” Intuitively, clear what it means: ??? Some products allow you to do this. How?
Abstract Union SELECTDISTINCT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A SELECTR.A FROM R, S WHERER.A=S.A UNION ALL SELECT R.A FROM R, T WHERE R.A=T.A SELECTR.A FROM R, S WHERER.A=S.A UNION SELECT R.A FROM R, T WHERE R.A=T.A Why aren’t there duplicates? What if we want duplicates?
Union Company(name, hq_city) Product(pname, maker, factory_loc) Write “Company names who make gizmos in US or China”
Intersect The SQL key word is now intersect… “The headquarter location of companies who make gizmos in US AND China”
Intersect: subtle problem… Company(name, hq_city) Product(pname, maker, factory_loc) “Company names who make gizmos in US AND China” SELECThq_city FROM Company, Product WHERE maker = name and factor_loc = ‘US’ INTERSECT SELECThq_city FROM Company, Product WHERE maker = name and factor_loc = ‘China’ What if two companies HQ in US: BUT one has loc in China (but not US) and vice versa? What’s wrong?
Other Set operators • EXCEPT (set difference) • Also ALL variants • INTERSECT “min # of duplicates” • EXCEPT ALL “multiset subtract”
Subqueries Returning Relations Company(name, city)Product(pname, maker)Purchase(id, product, buyer) Return cities where one can find companies that manufacture products bought by Joe Blow SELECTCompany.city FROM Company WHERECompany.nameIN (SELECTProduct.maker FROM Purchase , Product WHEREProduct.pname=Purchase.product AND Purchase .buyer = ‘Joe Blow‘);
Subqueries Returning Relations Is it equivalent to this ? SELECTCompany.city FROM Company, Product, Purchase WHERECompany.name= Product.maker AND Product.pname = Purchase.product AND Purchase.buyer = ‘Joe Blow’ Beware of duplicates!
Removing Duplicates SELECTDISTINCTCompany.city FROM Company WHERECompany.nameIN (SELECTProduct.maker FROM Purchase , Product WHEREProduct.pname=Purchase.product AND Purchase .buyer = ‘Joe Blow‘); SELECTDISTINCTCompany.city FROM Company, Product, Purchase WHERECompany.name= Product.maker AND Product.pname = Purchase.product AND Purchase.buyer = ‘Joe Blow’ Now they are equivalent
Subqueries Returning Relations You can also use: s > ALL R s > ANY R EXISTS R Product ( pname, price, category, maker) Find products that are more expensive than all those produced By “Gizmo-Works” SELECT name FROM Product WHERE price > ALL (SELECT price FROM Purchase WHERE maker=‘Gizmo-Works’)
Question for Database Fansand their Friends • Can we express this query as a single SELECT-FROM-WHERE query, without subqueries ? • Hint: show that all SFW queries are monotone (figure out what this means). A query with ALL is not monotone
Correlated Queries Movie (title, year, director, length) Find movies whose title appears more than once. correlation SELECTDISTINCT title FROM Movie AS x WHERE year <> ANY (SELECT year FROM Movie WHERE title = x.title); Note (1) scope of variables (2) this can still be expressed as single SFW
Complex Correlated Query Product ( pname, price, category, maker, year) • Find products (and their manufacturers) that are more expensive than all products made by the same manufacturer before 1972 Very powerful ! Also much harder to optimize. SELECT DISTINCT pname, maker FROM Product AS x WHERE price > ALL (SELECT price FROM Product AS y WHERE x.maker = y.maker AND y.year < 1972);
Exercises • If you have difficultly with this material here is the way to fix it: • postgreSQL (free on the web/apt-get). Others on the website. • Play with examples on EdX. • Run and play with these examples
Basic SQL Summary • SQL provides a high-level declarative language for manipulating data (DML) • The workhorse is the SFW block • Powerful, nested queries also allowed.