910 likes | 1.18k Views
Joins And Subqueries. Multi-table queries (JOIN) 2 tables. Retrieve the part name for all parts shipped in quantities equal to 200. SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200;. Conjunct #1 is a PK FK comparison.
E N D
Multi-table queries(JOIN) 2 tables Retrieve the part name for all parts shipped in quantities equal to 200 SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200;
Conjunct #1 is a PK FK comparison Conjunct #2 qualifies retrieval Multi-table queries(JOIN) 2 tables So how does this work? SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200;
Multi-table queries(JOIN) 2 tables SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200; Conceptually, we can understand the execution of the query in terms of row scans. A Scan is a sequential inspection of many rows for the purpose of returning rows that meet a criteria. The following demonstrations are a simplified version of the Nested Loop technique (as opposed to “merge sort” and “hashed join”)
PK/FK comparison requires one scan of the FK set for each PK in the parent table. Row is included in the view where PK=FK and Qty = 200 Multi-table queries(JOIN) 2 tables SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200;
Multi-table queries(JOIN) 2 tables SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200; (Scan 1) (No Matches: TT) (P1=P1 300=200) (P1=P2 200=200) . . . (FK Scan) (P1=P5 100=200)
This removes duplicates in results Multi-table queries(JOIN) 2 tables SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200; (Scan 2) (Two Matches: TT) (P2=P1 300=200) (P2=P2 200=200) . . . (FK Scan) (P2=P5 100=200)
Multi-table queries(JOIN) 2 tables SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200; (Scan 3) ( No Matches: TT) (P3=P1 300=200) (P3=P2 200=200) . . . (FK Scan) (P3=P5 100=200)
Multi-table queries(JOIN) 2 tables SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200; (Scan 4) (And So On...) ( Two Matches: TT) (P4=P1 300=200) (P4=P2 200=200) . . . (FK Scan) (P4=P5 100=200)
Multi-table queries(JOIN) 2 tables SELECT DISTINCT PART.PartName FROM PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SHIPMENT.quantity = 200; (Scan 6) How many row comparisons? rows(PK-table) X rows(FK-table) 6 X 12 = 72
Multi-table queriesJOIN Evaluation Techniques • Three common methods used by DBMS optimizers to evaluate Joins • Nested Loop • Merge Scan • Hash Join
Multi-table queriesJOIN Evaluation Techniques • Nested Loop • Essentially the preceding demo • One table defined as external and one table defined as internal • (External:Internal) • (1:M) • (Parent:Child) • If there isn’t an index on the FK, the internal table has to be opened for a scan for every row of the external table
Multi-table queriesJOIN Evaluation Techniques • Merge Scan • Both tables have to be ordered by PK/FK • Parallel scans are executed on both tables • Qualifying rows are found by merging the order lists into groups or partitions
Multi-table queriesJOIN Evaluation Techniques • Hash Join • Both tables are stored using hash function on join attributes (PK/FK) • Execute join on each partition
Multi-table queries(JOIN) 3 tables Retrieve supplier name and part name for parts shipped in quantities less than 400. SELECT DISTINCT SUPPLIER.SupplierName, PART.PartName FROM SUPPLIER, PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.Quantity < 400;
Multi-table queries(JOIN) 3 tables SELECT DISTINCT SUPPLIER.SupplierName, PART.PartName FROM SUPPLIER, PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.Quantity < 400; (Scan 1) (One Match: TT T) (P1=P1 S1=S1 400>300) (P1=P2 S1=S1 400>200) . . . (FK Scan) (P1=P5 S1=S4 400>100)
Multi-table queries(JOIN) 3 tables SELECT DISTINCT SUPPLIER.SupplierName, PART.PartName FROM SUPPLIER, PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.Quantity < 400; (Scan 2) (No Matches: TT T) (P1=P1 S2=S1 400>300) (P1=P2 S2=S1 400>200) . . . (FK Scan) (P1=P5 S2=S4 400>100)
So we finish out the “P1” Supplier scan with no matches Multi-table queries(JOIN) 3 tables SELECT DISTINCT SUPPLIER.SupplierName, PART.PartName FROM SUPPLIER, PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.Quantity < 400; (Scan 2-5) (No Matches: TT T) . . .
Multi-table queries(JOIN) 3 tables SELECT DISTINCT SUPPLIER.SupplierName, PART.PartName FROM SUPPLIER, PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.Quantity < 400; (Scan 6) (One Match: TT T) (P2=P1 S1=S1 400>300) (P2=P2 S1=S1 400>200) . . . (FK Scan) (P2=P5 S1=S4 400>100)
Multi-table queries(JOIN) 3 tables SELECT DISTINCT SUPPLIER.SupplierName, PART.PartName FROM SUPPLIER, PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.Quantity < 400; (Scan 7) (One Match: TT T) (P2=P1 S2=S1 400>300) (P2=P2 S2=S1 400>200) . . . (FK Scan) (P1=P5 S2=S4 400>100)
Multi-table queries(JOIN) 3 tables SELECT DISTINCT SUPPLIER.SupplierName, PART.PartName FROM SUPPLIER, PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.Quantity < 400; (Scan 8) (And so on...) (One Match: TT T) (P2=P1 S3=S1 400>300) (P2=P2 S3=S1 400>200) . . . (FK Scan) (P1=P5 S3=S4 400>100)
Multi-table queries(JOIN) 3 tables SELECT DISTINCT SUPPLIER.SupplierName, PART.PartName FROM SUPPLIER, PART, SHIPMENT WHERE PART.PartNumber = SHIPMENT.PartNumber AND SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.Quantity < 400; How many row comparisons? 6x5x12 = 360
Multi-table queries(JOIN) 3 tables Retrieve the supplier name and part name for all shipments with quantity greater than or equal to 200 for which the warehouse is located in the same city as the supplier SELECT DISTINCT SUPPLIER.SupplierName, PART.PartName FROM SUPPLIER, PART, SHIPMENT WHERE (PART.PartNumber = SHIPMENT.PartNumber AND SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber) AND (SHIPMENT.Quantity >= 200 AND SUPPLIER.SupplierCity = PART.PartCity);
(Inner Table) (Outer Table) Data FK Value PK Value Data Data FK Value PK Value Data (PK=FK Scan) Data FK Value PK Value Data Data FK Value Data FK Value Outer Table Inner Table Multi-table queries(JOIN) Let’s recap... For each row in the outer table (Parent), every row in the inner table (Child) is scanned for a PK/FK match and any qualifying predicate. In your WHERE clause, you must have a PK=FK statement for every Parent/Child relationship involved in the query.
Scan inner table for PartNumber = ‘P2’ (Inner Table) (Outer Table) Remove non-qualifying rows Data FK Value PK Value Data Data FK Value PK Value Data Data FK Value PK Value Data (Intermediate Table) Data FK Value Data FK Value Data FK Value Then Join Outer with Intermediate Data FK Value Data FK value Sub-Queries (nested)Uncorrelated So... Joins are really just the combination of the PRODUCT and SELECT relational operators. But in the case where you have a qualifying predicate such as PartNumber=‘P2’, why do a PRODUCT of the outer table with the entire inner table? Why not eliminate some of the rows from the inner table first with a simple query and then do a PRODUCT of the resulting (and smaller) table? Recall, that the result of a relational operation is a table. So we can Join the outer table with the intermediate table resulting from the inner or sub-query.
Sub-Queries (nested)Introduction • Nested queries can be either correlated or uncorrelated • correlated:inner query depends on row that is currently being examined in the outer query • uncorrelated: inner query performed independently of outer query • The nested or sub-query usually appears in the WHERE clause of a query • Sub-queries can also appear in the FROM and HAVING clause
Sub-Queries (nested)Introduction • General structure of uncorrelated sub-query SELECT Item FROM table1 WHERE Item IN [NOT IN] (SELECT Item FROM table2 WHERE predicate); (Outer Query) (Inner Query)
Sub-Queries (nested)Uncorrelated Retrieve supplier names for suppliers who supply part P2 SELECT DISTINCTROW SUPPLIER.SupplierName FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber = 'P2');
Outer Result Now join outer table to intermediate result Intermediate Sub-Queries (nested)Uncorrelated SELECT DISTINCTROW SUPPLIER.SupplierName FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber = 'P2'); Execute inner query: Scan SHIPMENT for ‘P2’
Outer Result Now join outer table to intermediate result Intermediate Sub-Queries (nested)Uncorrelated SELECT DISTINCTROW SUPPLIER.SupplierName FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber = 'P2');
Intermediate Sub-Queries (nested)Uncorrelated SELECT DISTINCTROW SUPPLIER.SupplierName FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber = 'P2'); How many row comparisons? If formulated as a join? 12 + (5 x 4) = 32 (5 x 12) = 60
Sub-Queries (nested)Uncorrelated SELECT DISTINCTROW SUPPLIER.SupplierName FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber = 'P2'); Formulated as a Join SELECT DISTINCT SUPPLIER.SupplierName FROM SUPPLIER, SHIPMENT WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber=‘P2’; Use a join if you need columns from multiple tables. If, as in the example above, you need columns from only one table, use either a join or a subquery. A subquery may include the GROUP BY and HAVING clauses, but not the ORDER BY and UNION.
Sub-Queries (nested)Uncorrelated (3 tables) Retrieve supplier name and city for all suppliers who supply at least one galvanized part. SELECT DISTINCTROW SUPPLIER.SupplierName, SUPPLIER.SupplierCity FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber IN (SELECT PART.PartNumber FROM PART WHERE METAL = 'GALV'));
Inner-most query first PART is scanned for Metal=‘GALV’ Inner-most Intermediate result Sub-Queries (nested)Uncorrelated (3 tables) SELECT DISTINCTROW SUPPLIER.SupplierName, SUPPLIER.SupplierCity FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber IN (SELECT PART.PartNumber FROM PART WHERE METAL = 'GALV'));
Join SHIPMENT with Intermediate result of inner query Inner-most Intermediate result Intermediate SHIPMENT Result Join Sub-Queries (nested)Uncorrelated (3 tables) SELECT DISTINCTROW SUPPLIER.SupplierName, SUPPLIER.SupplierCity FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber IN (SELECT PART.PartNumber FROM PART WHERE METAL = 'GALV'));
Intermediate SHIPMENT Result Join Sub-Queries (nested)Uncorrelated (3 tables) SELECT DISTINCTROW SUPPLIER.SupplierName, SUPPLIER.SupplierCity FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber IN (SELECT PART.PartNumber FROM PART WHERE METAL = 'GALV')); Join SUPPLIER with Intermediate result of inner query
(Join) (Join) Sub-Queries (nested)Uncorrelated (3 tables) SELECT DISTINCTROW SUPPLIER.SupplierName, SUPPLIER.SupplierCity FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber IN (SELECT PART.PartNumber FROM PART WHERE METAL = 'GALV')); In Summary...
SUPPLIER PART SHIPMENT Sub-Queries (nested)Uncorrelated (3 tables) SELECT DISTINCTROW SUPPLIER.SupplierName, SUPPLIER.SupplierCity FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber IN (SELECT PART.PartNumber FROM PART WHERE METAL = 'GALV')); Written as a Join SELECT DISTINCTROW SUPPLIER.SupplierName, SUPPLIER.SupplierCity FROM SUPPLIER, SHIPMENT, PART WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber = PART.PartNumber AND PART.METAL = 'GALV';
Sub-Queries (nested)Uncorrelated (3 tables) SELECT DISTINCTROW SUPPLIER.SupplierName, SUPPLIER.SupplierCity FROM SUPPLIER WHERE SUPPLIER.SupplierNumber IN (SELECT SHIPMENT.SupplierNumber FROM SHIPMENT WHERE SHIPMENT.PartNumber IN (SELECT PART.PartNumber FROM PART WHERE METAL = 'GALV')); How many row comparisons for subquery? 6 + (2*12) + (3*5) = 45 (Join) As a Join? 6*5*12 = 360 (Join)
Sub-Queries (nested)An alternative formulation SELECT T1.Item …, T2.Item… FROM table1 T1, (SELECT Item FROM table2 WHERE predicate) As T2 WHERE T1.PK = T2.FK; In this case, the inner table is filtered by the predicate and then joined to the outer table using the alias T2. This overcomes the shortcoming of the IN method by allowing attributes from the inner table to be included in the outer SELECT. See handout for examples from class.
Sub-Queries (nested)Introduction • General structure of correlated sub-query • Using Exists SELECT Item FROM table1 WHERE EXISTS [NOT EXISTS] (SELECT Item FROM table2 WHERE PK = FK AND predicate); (Outer Query) (Inner Query)
Sub-Queries (nested)Introduction • General structure of correlated sub-query • Using IN SELECT Item FROM table1 WHERE predicate IN (SELECT Item FROM table2 WHERE PK = FK); (Outer Query) (Inner Query)
Sub-Queries (nested)Correlated Retrieve supplier names for suppliers who supply part P1 SELECT SUPPLIER.SupplierName FROM SUPPLIER WHERE EXISTS (SELECT SHIPMENT.PartNumber FROM SHIPMENT WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber = 'P1');
This is what makes it correlated. Sub-Queries (nested)Correlated Retrieve supplier names for suppliers who supply part P1 SELECT SUPPLIER.SupplierName FROM SUPPLIER WHERE EXISTS (SELECT SHIPMENT.PartNumber FROM SHIPMENT WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber = 'P1'); The inner query is executed once for every row in the SUPPLIER table. That is, the value for SupplierNumber is passed by value into the sub-query.
Tests if intermediate result is nonempty (empty set) Sub-Queries (nested)Correlated Retrieve supplier names for suppliers who supply part P1 SELECT SUPPLIER.SupplierName FROM SUPPLIER WHERE EXISTS (SELECT SHIPMENT.PartNumber FROM SHIPMENT WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber = 'P1'); For each nonempty intermediate result, the SupplierName is selected. The EXISTS test will be performed for each row in the SUPPLIER table.
Intermediate Sub-Queries (nested)Correlated SELECT SUPPLIER.SupplierName FROM SUPPLIER WHERE EXISTS (SELECT SHIPMENT.PartNumber FROM SHIPMENT WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber = 'P1'); Execute nested query EXISTS? Yes
Intermediate Sub-Queries (nested)Correlated SELECT SUPPLIER.SupplierName FROM SUPPLIER WHERE EXISTS (SELECT SHIPMENT.PartNumber FROM SHIPMENT WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber = 'P1'); Execute nested query EXISTS? Yes
Intermediate Sub-Queries (nested)Correlated SELECT SUPPLIER.SupplierName FROM SUPPLIER WHERE EXISTS (SELECT SHIPMENT.PartNumber FROM SHIPMENT WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber = 'P1'); Execute nested query EXISTS? No
Intermediate Sub-Queries (nested)Correlated SELECT SUPPLIER.SupplierName FROM SUPPLIER WHERE EXISTS (SELECT SHIPMENT.PartNumber FROM SHIPMENT WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber = 'P1'); EXISTS is False for S4 and S5
Sub-Queries (nested)Correlated Retrieve supplier names for suppliers who DO NOT supply part P1 SELECT SUPPLIER.SupplierName FROM SUPPLIER WHERE NOT EXISTS (SELECT SHIPMENT.PartNumber FROM SHIPMENT WHERE SUPPLIER.SupplierNumber = SHIPMENT.SupplierNumber AND SHIPMENT.PartNumber = 'P1');