240 likes | 260 Views
This review discusses various join strategies, including nested loop join, index join, blocked nested loop join, hash join, sort-merge join, and SQL transformation techniques.
E N D
Review Jun 5th, 2002
HW#5.2 R R.a = S.b S (52buffers)
Join Result R & S Input buffer for R . . . . . . Output buffer Input buffer for S Page-Oriented Nested Loop Join • For each page in the outer relation R, we scan the entireinner relation S. • Cost: M + M * N __________________
Nested Join v.s. Index Join • Simple Nested Join • Cost: M + ( PR*M ) * N _______________________ • Page-oriented Simple Nested Join • Cost: M + M * N _______________________ • Index Join (Unclustered) • Cost: M + ( PR*M ) * (1.2 + 1) _______________________ • When is Nested Join better than Index Join? _______________________
Join Result R & S Input buffer for R ( k < B-1 pages) . . . . . . . . . Output buffer Input buffer for S Blocked Nested Loop Join • For each matching tuple r in R-block, s in S-page, add<r, s> to result. Then read next R-block, scan S, etc. • Cost: M + ( M / (B-2) ) * N ____________________
Original Relation Partitions OUTPUT 1 1 2 INPUT 2 hash function h1 . . . B-1 B-1 B main memory buffers Disk Disk Partitions of R & S Join Result Hash table for partition Ri (k < B-1 pages) hash fn h2 h2 Output buffer Input buffer for Si B main memory buffers Disk Disk Hash-Join • Partition both relations using hash fn h1: R tuples in partition i will only match S tuples in partition i. __________ • Read in a partition of R, hash it using h2 (<> h1). Scan matching partition of S, search for matches. ________
Blocked Join v.s. Hash-Join • Blocked Join • Cost: M + ( M / (B-2) ) * N ___________________ • Hash Join • Cost: 3 * ( M + N ) ___________________ • When is Blocked Join better than Hash Join? ___________________
Original Relation Partitions(B-1) pages Sort-Join INPUT 1 1 2 OUTPUT 2 Multi sorting . . . • Sorting both relations using Multi-way sort:________ B-1 M_ B-1 B main memory buffers Disk Disk Partitions of R & S Join Result • Read in each intermediate result of R and S, search for matches. ________ INPUT 1 _ M_ B-1 OUTPUT Merge Join 1 _ N_ B-1 Disk Disk B main memory buffers
Sort-Merge Join v.s. Hash-Join • Sort-Merge needs more buffer space • Sort-Merge Join • Cost: 3 * ( M + N ) • Buffer Size: ________________ • Hash Join • Cost: 3 * ( M + N ) • Buffer Size: ________________ • Sort-Merge join is less sensitive to data skew • Result of Sort-Merge join is sorted
HW#5.4 SQL Transformation • SELECT DISTINCT F.FirstName, F.LastName • FROM GradStudents AS G, Faculty AS F, Advise AS A • WHERE G.LoginID = A.Student AND F.LoginID = A.Advisor • AND G.Office = '224'; • GradStudents: 157 tuples (20 distinct values for Office, uniform distribution) • Faculty: 53 tuples Advise: 87 tuples
HW#5.4 SQL Transformation SELECT EntryYear, count(*) FROM GradStudentsWHERE FirstName = 'David'GROUP BY EntryYearHAVING EntryYear >= 1995ORDER BY EntryYear DESC SELECT EntryYear, COUNT(*) FROM GradStudents WHERE FirstName = 'David' GROUP BY EntryYear HAVING COUNT(*) >= 10 ORDER BY EntryYear DESC
HW#5.4 SQL Transformation SELECT FirstName FROM Faculty WHERE FirstName IN ( SELECT FirstName FROM GradStudents ) SELECT FirstName FROM Faculty WHERE FirstName NOT IN ( SELECT FirstName FROM gradStudents )
HW#5.4 SQL Transformation SELECT LoginID FROM UndergradStudents WHERE EntryYear >= ANY ( SELECT EntryYear FROM GradStudents) SELECT LoginID FROM UndergradStudents WHERE EntryYear >= ALL ( SELECT EntryYear FROM GradStudents)
HW#4.1 XML & XQuery <!ELEMENT products (product*)> <!ELEMENT product (name, price, description, store*)> <!ELEMENT store (name, phone, markup)> • Which products are sold at least in one store? • What are the product-store pairs whose markup is no lower than 15%? • Which stores sell some products with a price higher than 50? • Which products (except “gizmo”) are sold in some store that also sells the product “gizmo”?
HW#4.1 XML & XQuery <!ELEMENT products (product*)> <!ELEMENT product (name, price, description, store*)> <!ELEMENT store (name, phone, markup)> • Returns the names and prices of all products that are sold in all stores with a markup of 25%. • Returns the names and prices of all products that are sold at least at one store with a markup of 25%
HW#4.1 XML & XQuery <!ELEMENT products (product*)> <!ELEMENT product (name, price, description, store*)> <!ELEMENT store (name, phone, markup)> • <products>{ • FOR $p IN documents(“database.xml”)//products/row • RETURN • <product pid = “{$p/pid/text()}”> • <name>{$p/name/text()}</name> • <price>{$p/price/text()}</price> • <description>{$p/description/text()}</description> • {FOR $x IN documents(“database.xml”)//sells/row[pid = $p/pid] • FOR $s IN documents(“database.xml”)//stores/row[sid = $x/sid] • RETURN • <store sid = “{$s/sid/text()}”> • <name>{$s/name/text()}</name> • <phone>{$s/phone/text()}</phone> • <markup>{$x/markup/text)()}</markup> • </store> • }</product> • }</products>
Midterm 1 Company (DeptID, Name, Budget, CEOEmployID, CEOContratID, Since) Work-in (EmployID, Lot, DeptID, Name, Budget, CEOEmployID, Since)