110 likes | 130 Views
University of California – Riverside Department of Computer Science & Engineering cs179G – Database Project Phase #4. Sort-Merge Join Implementation Details for Minibase. by Demetris Zeinalipour http://www.cs.ucr.edu/~cs179g-t/. Sort-Merge Join Review. Sailors S. Reserves R.
E N D
University of California – Riverside Department of Computer Science & Engineering cs179G – Database Project Phase #4 Sort-Merge Join Implementation Details for Minibase by Demetris Zeinalipour http://www.cs.ucr.edu/~cs179g-t/ 1
Sort-Merge Join Review Sailors S Reserves R Query with JOIN SELECT S.name FROM Sailors S, Reserves R WHERE S.sid = R.id • How to implement the JOIN operator? • Using Nested Loop Joins. tuple-at-a-time: For each tuple in S check with every tuple in R variation: page-at-a-time (every page contains several records) • Using Block Nested Loop Join. Idea: Load in memory smaller relation e.g. S and then scan relation S on a page-at-a-time basis. Notice: The idea can be generalized even if the smaller relation doesn’t fit in memory • Using Sort-Merge Join Idea: Sort both relations using an external sort algorithm and then merge the relations. 2
External-Sorting Review 1/3 • When? If the data to be sorted is too big to fit in main memory then we need an external sort algorithm. • Simple approach: 2-way Merge-Sort } Inpage1:[ 11,7, 2,1] Inpage2:[19, 7,5,4 ] Use quicksort internally Memory 4 • Steps • Fetch a page to memory • Sort Page in memory • Write page back to disk • Merge pages levelwise • (see next page) HeapFile 1 inpage1 1-3 outpage 4 inpage2 3
External-Sorting Review 2/3 2-way mergesort => • Passes: log2N+1=4 • Cost: 2N(log2N+1)=64 I/O • Expensive In project we use External Sort (sort.C) which utilizes all available buffer pages (10) and reduces the number of Passes and the I/O cost 4
External-Sorting Review 3/3 Idea similar with 2-way Mergesort with the difference that we utilize B-1 buffer pages (B>3) Don’t worry about this implementation as it is already implemented in sort.C Use quicksort internally Memory Heapf 1-3 0 1 2 In this project you should call: Sort() from within the sortMerge constructor before proceeding to the merge phase outpage 3 4 5 4 6 7 5
The Merging Phase of SMJoin • Now that the two relations R and S are sorted we must merge them. • Merge using 2 iterators to move from page to page and from record to record • Works fine ONLY if both R and S have NO duplicates. (e.g if sid is a foreign key in a 1:1 relation Sailor, Address) Page Tr R.sid Heapfile S.sid Gs 6 Forward the iterator (Tr or Gs) that has the smallest value until Tr=Gs then output value
The Merging Phase of SMJoin • What if relations have duplicates? (check example) (either both of them or just one of them) • Therefore we need to use 3 iterators (1 for Marking) Tr R.sid S.sid Ts Gs • The one extra Ts iterator will be used as soon Tr=Gs at which point we will move Ts to Gs position and use Ts to iterate S • The full version of the algorithm is shown in 2 slides 7
HeapFiles (heapfile.h, scan.C) • Database File: Organization of various pages into a logical. • In a heapfile pages are unordered within the file • In order to Scan the pages (records) of a heapfile we will use scan.C Example: // Sorting a heapfile (after : heapR stored in Catalog) Sort(“unsortheapR”, “heapR”, ..)rest of params from SortMerge() // Creating a scan on a heapfile HeapFile heapR(“heapR”, status); Scan * Rscan = heapR.openScan(Rstatus); // Scan until DONE Rstatus = Rscan->getNext(RID rid,char* RecR, int lenR); // Inserting results in Out Heapfile HeapFile heapOut(“heapOut”, status); memmove(char *RecO, char* RecR, int lenR);//do same for S heapOut.insertRecord(char *recptr, lenR+lenS, RID&outRID); All the work of Pinning/Unpinning pages is done from within Scan since it locates the directoryheader Page from the catalog and proceeds from there on with getNext() 8
The Merging Phase of SMJoin } Sort R & S The full algorithm (this is all you need to implement) } Init Iterators } Fast forward R } Fast forward S Consider Scan::Position() 9
Implementation of the SMJoin Algorithm The Big Picture main.C (or smjoin_main.C same thing) => SMJTester.C (runTests()) => test1() (this runs all 6 tests actually) => createFiles(); // creates 5 Heapfiles using // the data of same constant integer arrays (data0,…data4) => test(i) { // inside sortMerge Constructor } //sortMerge heapfileR No_of_cols joinColumn sortMerge sm(“file0”, 2, attrType, attrsize, 0, “file1”, 2, attrType, attrsize, 0, “test1”, 10, Ascending, s); { // inside sortMerge Constructor heapfileS Out_heapfile No_of_pages available minirel.h => Sort( (infile) “file0”, (outfile) “BRfile1”, (no_of_cols)2, attrType, attrsize, (joincolumn)0, (no_of_pages_available)10, (status)s ) 10
Where to start from? • Start out by reading Sort-Merge Join from book (Chap 12.5.2 2nd edition, 14.4.2 3rd edition) • Study SMJTester.C which contains the tests and understand the test scenarios. • Start Implementing sortMerge.C by invoking the Sort() constructor etc. • Have a closer look at the new classes that you have: Sort.C, Heapfile.C and Scan.C 11