790 likes | 947 Views
Database Management Systems. Chapter 5 Advanced Queries. Tables. Organization. Harder Questions Subqueries Not In, LEFT JOIN UNION, Multiple JOIN columns, Recursive JOIN Other SQL Commands DDL: Data Definition Language DML: Data Manipulation Language OLAP Microsoft SQL Server Oracle
E N D
Database Management Systems Chapter 5 Advanced Queries
Organization • Harder Questions • Subqueries • Not In, LEFT JOIN • UNION, Multiple JOIN columns, Recursive JOIN • Other SQL Commands • DDL: Data Definition Language • DML: Data Manipulation Language • OLAP • Microsoft SQL Server • Oracle • Microsoft Access Crosstab
How many cats are “in-stock” on 10/1/01? Which cats sold for more than the average price? Which animals sold for more than the average price of animals in their category? Which animals have not been sold? Which customers (who bought something at least once) did not buy anything between 11/1/01 and 12/31/01? Which customers who bought Dogs also bought products for Cats (at any time)? Harder Questions
Which cats sold for more than the average sale price of cats? Assume we know the average price is $170. Usually we need to compute it first. Sub-query for Calculation SELECT SaleAnimal.AnimalID, Animal.Category, SaleAnimal.SalePrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE ((Animal.Category="Cat") AND (SaleAnimal.SalePrice>170)); SELECT SaleAnimal.AnimalID, Animal.Category, SaleAnimal.SalePrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE ((Animal.Category="Cat") AND (SaleAnimal.SalePrice> ( SELECT AVG(SalePrice) FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE (Animal.Category=“Cat”) ) ) );
List all customers (Name) who purchased one of the following items: 1, 2, 30, 32, 33. Could use Or, but harder to write. In ( . . .) matches any in the list. Query04_13 Query Sets (IN) SELECT Customer.LastName, Customer.FirstName, SaleItem.ItemID FROM (Customer INNER JOIN Sale ON Customer.CustomerID = Sale.CustomerID) INNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleID WHERE (SaleItem.ItemID In (1,2,30,32,33)) ORDER BY Customer.LastName, Customer.FirstName;
List all customers who bought items for cats. Using IN with a Sub-query SELECT Customer.LastName, Customer.FirstName, SaleItem.ItemID FROM (Customer INNER JOIN Sale ON Customer.CustomerID = Sale.CustomerID) INNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleID WHERE (SaleItem.ItemID In (SELECT ItemID FROM Merchandise WHERE Category="Cat") );
List all of the customers who bought something in March and who bought something in May. (Two tests on the same data!) LastName First Adkins Inga McCain Sam Grimes Earl Query04_14 SubQuery (IN: Look up a Set) SELECT Customer.LastName, Customer.FirstName FROM Customer INNER JOIN Sale ON Customer.CustomerID = Sale.CustomerID WHERE ((Month([SaleDate])=3)) And Customer.CustomerID In (SELECT CustomerID FROM Sale WHERE (Month([SaleDate])=5) );
Any: value is compared to each item in the list. If it is True for any of the items, the statement is evaluated to True. All: value is compared to each item in the list. If it is True for every item in the list, the statement is evaluated to True (much more restrictive than any. Query04_15 SubQuery (ANY, ALL) SELECT DISTINCTROW Animal.AnimalID, Name, SalePrice, ListPrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE (((SalePrice) > Any (SELECT 0.80*ListPrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE Category = "Cat")) AND ((Category)="Cat"));
Which animals have not been sold? Start with list of all animals. Subtract out list of those who were sold. AnimalID Name Category 12 Leisha Dog 19 Gene Dog 25 Vivian Dog 34 Rhonda Dog 88 Brandy Dog 181 Fish Query04_16 SubQuery: NOT IN (Subtract) SELECT Animal.AnimalID, Animal.Name, Animal.Category FROM Animal WHERE (Animal.AnimalID Not In (SELECT AnimalID From SaleAnimal));
SubQuery: NOT IN (Data) Animal SaleAnimal ID Name Category Breed 2 Fish Angel 4 Gary Dog Dalmation 5 Fish Shark 6 Rosie Cat Oriental Shorthair 7 Eugene Cat Bombay 8 Miranda Dog Norfolk Terrier 9 Fish Guppy 10 Sherri Dog Siberian Huskie 11 Susan Dog Dalmation 12 Leisha Dog Rottweiler ID SaleID SalePrice 2 35 $10.80 4 80 $156.66 6 27 $173.99 7 25 $251.59 8 4 $183.38 10 18 $150.11 11 17 $148.47 Which animals have not been sold?
Which animals have not been sold? LEFT JOIN includes all rows from left table (Animal) But only those from right table (SaleAnimal) that match a row in Animal. Rows in Animal without matching data in Sale Animal will have Null. AnimalID Name Category 12 Leisha Dog 19 Gene Dog 25 Vivian Dog 34 Rhonda Dog 88 Brandy Dog 181 Fish Query04_17 Left Outer Join SELECT Animal.AnimalID, Animal.Name, Animal.Category FROM Animal LEFT JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE (SaleAnimal.SaleID Is Null);
Left Outer Join (Example) ID SaleID SalePrice 2 35 $10.80 4 80 $156.66 Null Null Null 6 27 $173.99 7 25 $251.59 8 4 $183.38 Null Null Null 10 18 $150.11 11 17 $148.47 Null Null Null ID Name Category Breed 2 Fish Angel 4 Gary Dog Dalmation 5 Fish Shark 6 Rosie Cat Oriental Shorthair 7 Eugene Cat Bombay 8 Miranda Dog Norfolk Terrier 9 Fish Guppy 10 Sherri Dog Siberian Huskie 11 Susan Dog Dalmation 12 Leisha Dog Rottweiler
Which animals have not been sold? Older Syntax for Left Join SELECT ALL FROM Animal, SaleAnimal WHERE Animal.AnimalID *= SaleAnimal.AnimalID And SaleAnimal.SaleID Is Null; Oracle syntax--note that the (+) symbol is on the reversed side. SELECT ALL FROM Animal, SaleAnimal WHERE Animal.AnimalID = SaleAnimal.AnimalID (+) And SaleAnimal.SaleID Is Null;
Don’t know the average, so use a subquery to look it up. Watch parentheses. Query04_18 SubQuery for Computation SELECT SaleAnimal.AnimalID, Animal.Category, SaleAnimal.SalePrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE ((Animal.Category="Cat") AND (SaleAnimal.SalePrice> ( SELECT AVG(SalePrice) FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE (Animal.Category=“Cat”) ) ) );
List the Animals that have sold for a price higher than the average for animals in that Category. The subquery needs to compute the average for a given category. Problem: Which category? Answer: the category that matches the category from the main part of the query. Problem: How do we refer to it? Both tables are called Animal. This query will not work yet. Correlated Subquery SELECT AnimalID, Name, Category, SalePrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE (SaleAnimal.SalePrice> (SELECT Avg(SaleAnimal.SalePrice) FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE (Animal.Category = Animal.Category) ) ) ORDER BY SaleAnimal.SalePrice DESC;
List the Animals that have sold for a price higher than the average for animals in that Category. Match category in subquery with top level Rename tables (As) Correlated Subquery Recompute subquery for every row in top level--slow! Better to compute and save Subquery, then use in join. Query04_19 Correlated SubQuery (Avoid) SELECT A1.AnimalID, A1.Name, A1.Category, SaleAnimal.SalePrice FROM Animal As A1 INNER JOIN SaleAnimal ON A1.AnimalID = SaleAnimal.AnimalID WHERE (SaleAnimal.SalePrice> (SELECT Avg(SaleAnimal.SalePrice) FROM Animal As A2 INNER JOIN SaleAnimal ON A2.AnimalID = SaleAnimal.AnimalID WHERE (A2.Category = A1.Category) ) ) ORDER BY SaleAnimal.SalePrice DESC;
Assume small query 100,000 rows 5 categories of 20,000 rows 100,000 * 20,000 = 1 billion rows to read! Correlated Subquery Problem Animal + SaleAnimal Category SalePrice Compute Avg: $37.78 Fish $10.80 Dog $156.66 Fish $19.80 Cat $173.99 Cat $251.59 Dog $183.38 Fish $1.80 Dog $150.11 Dog $148.47 Compute Avg: $174.20 Compute Avg: $37.78 Compute Avg: $169.73 Compute Avg: $169.73 Recompute average for every row in the main query!
Compute the averages once and save query JOIN saved query to main query Two passes through table: 1 billion / 200,000 => 10,000 More Efficient Solution: 2 queries Animal + SaleAnimal Saved Query Category SalePrice Category AvgOfSalePrice Fish $10.80 Dog $156.66 Fish $19.80 Cat $173.99 Cat $251.59 Dog $183.38 Fish $1.80 Dog $150.11 Dog $148.47 Bird $176.57 Cat $169.73 Dog $174.20 Fish $37.78 Mammal $80.72 Reptile $181.83 Spider $118.16 JOIN Animal.Category = Query1.Category
Offices in Los Angeles and New York. Each has an Employee table (East and West). Need to search data from both tables. Columns in the two SELECT lines must match. UNION Operator SELECT EID, Name, Phone, Salary, ‘East’ AS Office FROM EmployeeEast UNION SELECT EID, Name, Phone, Salary, ‘West’ AS Office FROM EmployeeWest EID Name Phone Salary Office 352 Jones 3352 45,000 East 876 Inez 8736 47,000 East 372 Stoiko 7632 38,000 East 890 Smythe 9803 62,000 West 361 Kim 7736 73,000 West
UNION, INTERSECT, EXCEPT List the name of any employee who has worked for both the East and West regions. A B C T1 T2 SELECT EID, Name FROM EmployeeEast INTERSECT SELECT EID, Name FROM EmployeeWest
Sometimes need to JOIN tables on more than one column. PetStore: Category and Breed. Multiple JOIN Columns Animal AnimalID Name Category Breed DateBorn Gender . . . Breed Category Breed SELECT * FROM Breed INNER JOIN Animal ON Breed.Category = Animal.Category AND Breed.Breed = Animal.Breed
Need to connect a table to itself. Common example: Employee(EID, Name, . . ., Manager) A manager is also an employee. Use a second copy of the table and an alias. Reflexive Join Employee EID Name . . . Manager 115 Sanchez 765 462 Miller 115 523 Hawk 115 765 Munoz 886 SQL SELECT Employee.EID, Employee.Name, Employee.Manager, E2.Name FROM Employee INNER JOIN Employee AS E2 ON Employee.Manager = E2.EID Result EID Name Manager Name 115 Sanchez 765 Munoz 462 Miller 115 Sanchez 523 Hawk 115 Sanchez
Used to change data to a different context. Example: Define age categories for the animals. Less than 3 months Between 3 months and 9 months Between 9 months and 1 year Over 1 year Not available in Microsoft Access. It is in SQL Server and Oracle. CASE Function Select AnimalID, CASE WHEN Date()-DateBorn < 90 Then “Baby” WHEN Date()-DateBorn >= 90 AND Date()-DateBorn < 270 Then “Young” WHEN Date()-DateBorn >= 270 AND Date()-DateBorn < 365 Then “Grown” ELSE “Experienced” END FROM Animal;
AccountsReceivable Categorize by Days Late 30, 90, 120+ Three queries? New table for business rules Inequality Join AR(TransactionID, CustomerID, Amount, DateDue) LateCategory(Category, MinDays, MaxDays, Charge, …) Month 30 90 3% Quarter 90 120 5% Overdue 120 9999 10% SELECT * FROM AR INNER JOIN LateCategory ON ((Date() - AR.DateDue) >= LateCategory.MinDays) AND ((Date() - AR.DateDue) < LateCategory.MaxDays)
SQL SELECT • SELECT DISTINCT Table.Column {AS alias} , . . . • FROM Table/Query • INNER JOIN Table/Query ON T1.ColA = T2.ColB • WHERE (condition) • GROUP BY Column • HAVING (group condition) • ORDER BY Table.Column • { Union second select } • TRANSFORM aggfunction {Crosstab values} • SELECT . . . FROM . . . GROUP BY {Crosstab rows} • PIVOT pivotfield {Crosstab columns}
SQL Mnemonic Someone From Ireland Will Grow Horseradish and Onions SELECT FROM INNER JOIN WHERE GROUP BY HAVING ORDER BY SQL is picky about putting the commands in the proper sequence. If you have to memorize the sequence, this mnemonic may be helpful.
SQL Data Definition • Create Schema AuthorizationdbNamepassword • Create TableTableName (Column Type, . . .) • Alter TableTable {Add, Column, Constraint, Drop} • Drop {Table Table | Index Index On table} • Create IndexIndexName ON Table (Column {ASC|DESC})
Syntax Examples CREATE TABLE Customer (CustomerID INTEGER NOT NULL, LastName CHAR (10), more columns ); ALTER TABLE Customer DROP COLUMN ZipCode; ALTER TABLE Customer ADD COLUMN CellPhone CHAR(15);
Queries with “Every” Need EXISTS List the employees who have sold animals from every category. By hand: List the employees and the categories. Go through the SaleAnimal list and check off the animals they have sold.
Query With EXISTS List the Animal categories that have not been sold by an employee (#5). SELECT Category FROM Category WHERE (Category <> "Other") And Category NOT IN (SELECT Animal.Category FROM Animal INNER JOIN (Sale INNER JOIN SaleAnimal ON Sale.SaleID = SaleAnimal.SaleID) ON Animal.AnimalID = SaleAnimal.AnimalID WHERE Sale.EmployeeID = 5) If this query returns any rows, then the employee has not sold every animal. So list all the employees for whom the above query returns no rows: SELECT EmployeeID, LastName FROM Employee WHERE NOT EXISTS (above query slightly modified.)
Query for Every SELECT Employee.EmployeeID, Employee.LastName FROM Employee WHERE Not Exists (SELECT Category FROM Category WHERE (Category <> "Other") And Category NOT IN (SELECT Animal.Category FROM Animal INNER JOIN (Sale INNER JOIN SaleAnimal ON Sale.SaleID = SaleAnimal.SaleID) ON Animal.AnimalID = SaleAnimal.AnimalID WHERE Sale.EmployeeID = Employee.EmployeeID) ); Result: 3 Reasoner
Simpler Query for Every Sometimes it is easier to use Crosstab and the Count function. But some systems do not have Crosstab, and sometimes the lists would be too long. So you need to know both techniques.
SQL: Foreign Key CREATE TABLE Order (OrderID INTEGER NOT NULL, OrderDate DATE, CustomerID INTEGER CONSTRAINT pkorder PRIMARY KEY (OrderID), CONSTRAINT fkorder FOREIGN KEY (CustomerID) REFERENCES Customer (CustomerID) ); Order Customer OrderID OrderDate CustomerID CustomerID LastName FirstName Address … *
SQL Data Manipulation Commands Insert Intotarget (column1 . . .) VALUES (value1 . . .) Insert Intotarget (column1 . . .) SELECT . . . FROM. . . Delete Fromtable WHERE condition Updatetable SET Column=Value,. . . Where condition Note the use of the Select and Where conditions. Synatx is the same--only learn it once. You can also use subqueries.
Copy Old Animal Data INSERT INTO OldAnimals SELECT * FROM Animals WHERE AnimalID IN (SELECT AnimalOrderItem.AnimalID FROM AnimalOrder INNER JOIN AnimalOrderItem ON AnimalOrder.OrderID = AnimalOrderItem.OrderId WHERE (AnimalOrder.OrderDate<#1/1/01#) );
Delete Old Animal Data DELETE FROM Animals WHERE AnimalID IN (SELECT AnimalOrderItem.AnimalID FROM AnimalOrder INNER JOIN AnimalOrderItem ON AnimalOrder.OrderID = AnimalOrderItem.OrderId WHERE (AnimalOrder.OrderDate<#1/1/01#) );
Change the ListPrice of Animals at the PetStore. For cats, increase the ListPrice by 10%. For dogs, increase the ListPrice by 20%. Typically use two similar UPDATE statements. With the CASE function, the statements can be combined. Update Example UPDATE Animal SET ListPrice = ListPrice*1.10 WHERE Category = “Cat” ; UPDATE Animal SET ListPrice = ListPrice*1.20 WHERE Category = “Dog” ;
Break questions into smaller pieces. Test each query. Check the SQL. Look at the data. Check computations Combine into subqueries. Use cut-and-paste to avoid errors. Check for correlated subqueries. Test sample data. Identify different cases. Check final query and subqueries. Verify calculations. Quality: Building Queries Which customers who bought Dogs also bought products for Cats (at any time)? Who bought dogs? Who bought cat products? Dogs and cat products on the same sale. Dogs and cat products at different times. Dogs and never any cat products. Cat products and never any Dogs. Test SELECT queries before executing UPDATE queries.
Quality Queries: Example Which customers who bought Dogs also bought products for Cats? A. Which customers bought dogs? B. Which customers bought cat products? SELECT DISTINCT Animal.Category, Sale.CustomerID FROM Sale INNER JOIN (Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID) ON Sale.SaleID = SaleAnimal.SaleID WHERE (((Animal.Category)="Dog")) AND Sale.CustomerID IN ( SELECT DISTINCT Sale.CustomerID FROM Sale INNER JOIN (Merchandise INNER JOIN SaleItem ON Merchandise.ItemID = SaleItem.ItemID) ON Sale.SaleID = SaleItem.SaleID WHERE (((Merchandise.Category)="Cat")) );
Data Warehouse Predefined reports Interactive data analysis Operations data Daily data transfer OLTP Database 3NF tables Data warehouse Star configuration Flat files
Existing databases optimized for Online Transaction Processing (OLTP) Online Analytical Processing (OLAP) requires fast retrievals, and only bulk writes. Different goals require different storage, so build separate dta warehouse to use for queries. Extraction, Transformation, Transportation (ETT) Data analysis Ad hoc queries Statistical analysis Data mining (specialized automated tools) Data Warehouse Goals
Multidimensional Cube Pet Store Item Sales Amount = Quantity*Sale Price Category Customer Location Time Sale Date
Sales Date: Time Hierarchy Year Roll-up To get higher-level totals Levels Quarter Month Drill-down To get lower-level details Week Day
Star Design Dimension Tables Products Sales Date Fact Table Sales Quantity Amount=SalePrice*Quantity Customer Location
OLAP Computation Issues Compute Quantity*Price in base query, then add to get $23.00 If you use Calculated Measure in the Cube, it will add first and multiply second to get $45.00, which is wrong.
SQL Server Analysis Cube Design • Define the Data Source (choose the database) • Create a new cube—avoid the wizard • Create the OLAPItems query, Amount=Quantity*SalePrice • Pick the fact table: OLAPItems query • Create a new dimension: PetItemSaleDate • Star design • Sale table, SaleDate column, Time dimension • Create a second dimension: PetItemCategory • Star • Merchandise table, Category column • Create a third dimension: PetCustomerLocation • Snowflake • Sale, Customer, City tables • Country, State, City columns • Select the data to measure in the fact table • Measures, New: Quantity, Amount • Save the cube as: PetItemSales • Process the cube (icon) • MOLAP (stores data in a cube layout) • Select the defaults on the other options.
Oracle OLAP Design Create snapshots (Materialized Views) to preprocess queries. Mostly an issue of performance—use Star and bitmap indexes. Need Oracle Express ($$$$$) or have to build tables yourself.