180 likes | 300 Views
QUERY EXECUTION. 15.4 Two – Pass Algorithm Based on Sorting. Ameya Sabnis CS – 257 Section II ID 204. Introduction. Basic Idea Since we have a large relation R where B(R) is larger than M, where M is block of memory then Read M blocks of R in main memory Sort the M blocks in main memory
E N D
QUERY EXECUTION 15.4 Two – Pass Algorithm Based on Sorting Ameya Sabnis CS – 257 Section II ID 204
Introduction Basic Idea • Since we have a large relation R where B(R) is larger than M, where M is block of memory then • Read M blocks of R in main memory • Sort the M blocks in main memory • Write the sorted list into M blocks of disk, and refer this as sorted list of R • 1st Pass: Used to Sort the relation in some way. • 2nd Pass: Used to Merge the sorted sublists.
Duplicate Elimination Using Sorting • This is used to remove the repeated tuples • First the tuples are sorted in order • Then the tuples are merged in a way that the repeated tuples are removed from the memory. • Consider the example given on the next slide.
Example Consider tuples of relation R as R = 2,5,2,1,2,2,4,5,4,3,4,2,1,5,2,1,3 (17 tuples) Also there are three blocks in main memory M = 3, and each block takes 2 tuples. First Pass Now read the first 6 tuples in main memory (3 blocks) and sort them, store them in sub-list R1. Similarly, read the next 6 tuples to form sub-list R2 and R3 At the end of Pass 1: we get sub-lists as R1 = 1,2,2,2,2,5 R2 = 2,3,4,4,5 R3 = 1,1,2,2,3
Contd. 2nd Pass Now in this pass we bring in two tuples from each Sub-List The situation can be shown as Sub-list Main Mem. Waiting on disk R1 1 2 2 2 2 5 R2 2 3 4 4 4 5 R3 1 1 2 3 5 Now since ‘1’ is the least tuple in the main memory, make the copy of ‘1’ and remove all ‘1’ from memory blocks
Contd. After removing ‘1’ from the memory blocks the block of Sub-list 3 gets exhausted and we bring in the next block, the situation can be shown as: Sub-list Main Mem. Waiting on disk R1 2 2 2 2 5 R2 2 3 4 4 4 5 R3 2 3 5 Now the ‘2’ is removed from the memory block and this is continued and the tuples are sorted
Grouping and Aggregation Using Sorting The algorithm is similar to the Duplicate Elimination Using Sorting. • Read tuples R in to Memory M blocks at a time, Sort each M block and write them into the Sub-List. • Load each block of Sub-List into Memory Buffer • Repeatedly find the least value “v” of sorted key present among the first available tuples in buffer
Contd. • Compute all the aggregates for list L on this group • Examine each of the tuples with sort key v, and accumulate the needed aggregates • If a buffer becomes empty, replace it with the next block from the same sub-list.
A Sort-Based Union Algorithm Since it is a Union based algorithm, we have to perform the Union of the two large relations of R and S, the steps are as follows: • Bring M blocks of R into main memory, sort their tuples, and write the resulting sorted sub-list back to disk. • Do the same for S, to create sorted sub-lists for relation S. • Use one main memory buffer for each sub-list of R and S. Initialize each with the first block from the corresponding sub-list. • Repeatedly find the first remaining tuple ‘t’ among all the buffers. Copy ‘t’ to the output, and remove from the buffers all copies of ‘t’. If a buffer becomes empty, reload it with the next block from its sub-list.
Sort-Based Intersection and Difference • Here also we use the same algorithms as used in previous sections • In general we create the sorted sub-lists of Mblocks each for both argument relations R and S. • In this we consider the least tuple ‘t’ among the tuples in buffer • Consider the following example
Example Let there be 3 memory blocks, each block takes 2 tuples. There are 2 arguments R and S, R consists of 12 tuples and S consists of 5 tuples. As done previously during Pass 1, two sub-lists of R and 1 sub-list of S are created. Thus we get the Sub-list table as Sub-list Main Mem. Waiting on disk R1 1 2 2 2 2 5 R2 2 3 4 4 4 5 S1 1 1 2 3 5
Example Contd. Suppose we want to take the bag difference R – S. Here the least tuple in the memory buffer is 1. Count the Number of 1’s in R and S sub-list. After counting since 1 appears more in S than R we do not output any copies of tuple, the situation we get as Sub-list Main Mem. Waiting on disk R1 2 2 2 2 5 R2 2 3 4 4 4 5 S1 2 3 5
Contd. Since 2 is the least tuple, we count the number of appearances of ‘2’ in R and S. As ’2’ appears 5 times in R and once in S, we output the result as 4 times ‘2’. Similarly proceeding further we go on computing till the end, until all tuples are processed The output we get as: 2,2,2,2,4,4,4,5
A Simple Sort-Based Join Algorithm There are several ways that sorting can be used to join the two relations. In this concept the care should be taken that, the tuples from both the relations which share the same value should be in main memory. To avoid this situation, we try to reduce the main memory use for the other aspects of the algorithm. Thus large number of buffers are made available for the tuples with the joint attribute values.
Contd. Given relations R and S to join, and given Mblocks of main memory for buffers, we do the following: • Sort R and S using two phase sorting algorithm • Merge the sorted R and S, and following functions are performed repeatedly • Find the least value y of the join attributes Y that is currently at the front of the blocks for R and S. • If y does not appear at the front of the other relation, then remove the tuple(s) with sort key y. • Output all the tuples that can be formed by joining tuples from R and S with a common Y-value y.
A More Efficient Sort Based Join If we do not worry about the large number of tuples with common values then we can combine second phase of sort with the join itself. The process is as follows: • Create the sorted sub-list of size M for both R and S • Bring the first block of each sub-list into buffer • Repeatedly find the least Y-value y among the first available tuples of all the sub-lists. Identify all the tuples of both relations that have Y-value y, perhaps using some of the M available buffers to hold them, Output the join of all tuples from R with all tuples from S that share this common Y-value