1 / 16

Merge Join and Sorting

24: Merge Join and Sorting. CS347. 2. Three Primary Classes of Join Algorithm . Block-Nested Loops (vs. Simple Nested Loops. ) Merge-Join Hash-Joins. 24: Merge Join and Sorting. CS347. 3. Simple Nested Loops. R join SFor each r ? RFor each s ? Sif pred(r,s) then output result// ig

nash
Download Presentation

Merge Join and Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. 24: Merge Join and Sorting CS347 1 Merge Join and Sorting Objectives: Detailed relationship between disk and memory/buffer allocation. External two phase sort.

    2. 24: Merge Join and Sorting CS347 2 Three Primary Classes of Join Algorithm Block-Nested Loops (vs. Simple Nested Loops. ) Merge-Join Hash-Joins

    3. 24: Merge Join and Sorting CS347 3 Simple Nested Loops R join S For each r ? R For each s ? S if pred(r,s) then output result // ignores that data is on disk, read in blocks

    4. 24: Merge Join and Sorting CS347 4 Block-Nest Loops - Assumptions Dedicate Most Memory to Outer-Loop Consider that there are many tuples per page.

    5. 24: Merge Join and Sorting CS347 5 Block Nested-Loops Detailed Join(relation R, relation S, buffer_space M){ // M in units of disk pages Until done( table_scan(S,M-1)){ // Read M-1 pages of R if equijoin then build index on portion of S in memory Open(R); Until done(table_scan(R, 1){ for each tuple r of R in memory{ “find joining tuples of S” // if equijoin use index “output them” // if thetajoin another loop }}}}

    6. 24: Merge Join and Sorting CS347 6 What to Notice: Outer-loop iterates over “inner” relation If equi-join, loop structure nested 3 deep If theta-join, loop structure nested 4 deep Call structure not parameterized by theta/equi theta predicate not passed in

    7. 24: Merge Join and Sorting CS347 7 Number of I/O’s (cost) of Block-Nested Loops Run through the complete inner relation, S, for each chunk of R If S fits in M-1 buffers ? B(R) + B(S) If S does not fit ? B(S)/M-1 iteration I/O Cost = (B(S)/M-1) * (M-1 + B(R))

    8. 24: Merge Join and Sorting CS347 8 What About Writing the Result Book convention – no cost in the operator for the result in some cases, results are immediate input to another operator. no I/O cost for output introduce an explicit new operator “Save intermediate result to disk”

    9. 24: Merge Join and Sorting CS347 9 Merge-Join First we will consider equijoins on primary index keys. create table customers (cid … primary key … ) create table orders (oid …, cid … primary key, …)

    10. 24: Merge Join and Sorting CS347 10 Merge-Join Pseudo Code Assume block structure is buried in iterator abstraction Merge-Join(relation R, relation S, attr rjk, attr sjk ){ r = open(R, 1) // read a block of R s = open(S, 1) // read a block of S until( r = EOF or s = EOF) { if (r.rjk = s.rjk) {output(r,s), r = next(r)} if (r.rjk < s.rjk) {r = next(r)} if (r.rjk > s.rjk) {s = next(s)} }}

    11. 24: Merge Join and Sorting CS347 11 Cost of Merge-Join B(R) + B(S)

    12. 24: Merge Join and Sorting CS347 12 What if an equijoin, but not on index key? Is there a secondary index on the join argument? Do I want to use the secondary index?

    13. 24: Merge Join and Sorting CS347 13 Two Phase External Sorting For all intents and purposes, sorting a file, f, bigger than available memory requires 2 * (| f | / B) I/O reads + 2 * (| f | / B) I/O writes where | f | is the size of the file in bytes, B is the number of bytes in a block common notation: | f | / B = || f ||

    14. 24: Merge Join and Sorting CS347 14 Merge-Sort (divide and conquer) that considers external storage Phase 1 == Divide until done read file until memory full sort that portion write result in its own file Phase 2 == Merge read 1 block of each file until done produce sorted block write to disk if a file’s block is exhaused, read another block How large a file can we sort?

    15. 24: Merge Join and Sorting CS347 15 Merge-Sort Cost – No Index 5(B(R) + B(S)) 3x to sort 1x (did have to write the sorted results out) 1x for merge sort May feel high – but it is very predictable

    16. 24: Merge Join and Sorting CS347 16 Many Opportunities to be Clever what if 1 relation fits in memory? sort is 1*B(R) integrate merge-join with second phase of sort

More Related