350 likes | 529 Views
Supporting top-k join queries in relational databases. Ihab Ilyas, Walid Aref, Ahmed Elmagarmid Presented by Jing Chen. 1. Motivation. Select A.1, B.2 From A, B, C Where A.1=B.1 and B.2 = C.2 Order By (0.3*A.1 + 0.7*B.2) Stop After 5. Traditional Joins. Nested Loop Merge Join
E N D
Supporting top-k join queries in relational databases Ihab Ilyas, Walid Aref, Ahmed Elmagarmid Presented by Jing Chen
1. Motivation • Select A.1, B.2 From A, B, C Where A.1=B.1 and B.2 = C.2 Order By (0.3*A.1 + 0.7*B.2) Stop After 5
Traditional Joins • Nested Loop • Merge Join • Hash Join
2. Goals/Contribution • Propose a pipe lined Rank-Join algorithm • Analyze the I/O cost of the algorithm • Implement the algorithm • Propose a optimal join strategy • Evaluate performance
3. Ripple Join JOIN : L.A = R.A L and R are descending ordered by B (L1(1,1,5) R1(1,3,5)) L R
3. Ripple Join --contd JOIN: L.A = R.A (L2,R2) {(2,2,4),(2,1,4)} L R
Ripple Join --contd JOIN: L.A = R.A (L2,R2) {(2,2,4),(2,1,4)} (L2,R1) {2,2,4), (1,3,5)} (L1,R2) {(1,1,5), (2,1,4)} L R
3. Variation Of Ripple Join Rectangle Block Hash
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (1). Get a valid combination using any certain algorithm Ripple Select (L1, R1) => No Result
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (1). Get a valid combination using any certain algorithm Select (L2, R2) (L2, R2), (L2, R1), (L1, R2) => (L1, R2)
4. Proposed Algorithm -- Contd Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (2). Compute the score (J) for the result J1(L1, R2) => L.B + R.B = 5 + 4 = 9
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (3). Compute a water mark score (T) by Max {(Last L).B + (First R.B), (First L).B + (Last R).B}
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (3). Compute a water mark (T) score by Max {(Last L).B + (First R.B), (First L).B + (Last R).B} Selection (L1, R1) , (L2, R2) => T = Max (L2.B + R1.B, L1.B + R2.B) =Max (4+5, 5+4) = 9
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 J1= 9 T = 9 J1 >= T Report J1 Since we need top 2 (k=2), continue until k=2 and Min(J1, J2, …Jk) > T
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 Select (L3, R3) (L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3) => (L3, R3), (L2, R3) J2(L2, R3) = 4 + 3 = 7 J3(L3, R3) = 3 + 3= 6
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 Calculate T T = Max { (Last L).B + (First R).B, (First L).B+ (Last R).B} = Max {L3.B + R1.B , L1.B + R3.B} = Max(3 + 5, 5 + 3) = 8
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 T = 8 J1(L1,R2) = 9 reported J2( L2, R3) = 7 J3(L3, R3) = 6 Note, J’s are in descending order Min (J) = 6 < T Continue Comment: Calculate T before J is more efficient. Can stop after find first Jk >= T
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 Select (L4, R4) => (L4, R1), (L2, R4), (L3, R4) J(L4, R4) = 7, J(L2, R4) = 6, J(L3, R4) = 5 T= Max(L4.B+R1.B, L1.B + R4.B) = Max(7, 7) = 7
4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 T= 7 J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7, J3(L3, R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5 Min(J1, J2) = 7 >= T (k = 2) Comment: When reach all records, T does not need to be calculated, unless, calculate T first, and compare each J(i) with T immediately
5. Implementation • Hash rank join operator (HRJN) - Use Hash Ripple Join - Two hash table contain the two inputs - A queue holds ordered join results - Ltop, Rtop, Lbottom, Rbottom are used to calculate T
5. Implementation contd • Issues with HRJN Buffer problem Local Ranking Problem
5. Implementation contd • Use Block Ripple Join to Solve Local Ranking Problem. (block size = 2)
5. Implementation contd • HRJN* score-guided join strategy - How to select next (block) tuple T1 = Ltop + Rbottom , T2 = Lbottom + Rtop T = Max(T1, T2) If T1 > T2, need to reduce T1. How?
5. Implementation contd • HRJN* score-guided join strategy - How to select next (block) tuple T1 = Ltop + Rbottom , T2 = Lbottom + Rtop T = Max(T1, T2) If T1 > T2, need to reduce T1. How? Reduce Rbottom and not reduce Lbottom (descending ordered), thus more tuples should be retrieved from R to reduce T1
6. Join Order • When more than two tables join, the join order matters. (A and C have high similarity)
6. Join Order -- contd • Rank-Join order heuristic - Get a ranked sample, top S ranked list from L and R - Calculate the similarity using footrule Where L(i) and R(i) are the rank of object i in L and R
7. Generalizing the rank-join • Using indexes • Eliminate duplications • Faster termination
8. Performance Evaluation Selectivity = 0.2 and m= 4
8. Performance Evaluation Selectivity = 0.2 and m= 4
8. Performance Evaluation Selectivity = 0.2 and m= 4