1 / 13

Supporting Top- k join Queries in Relational Databases

Supporting Top- k join Queries in Relational Databases. Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid. Presented by: Z. Joseph, CSE-UT Arlington. Introduction. Often searches are done on multiple features Each feature produces a different ranking for the query

tara
Download Presentation

Supporting Top- k join Queries in Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington

  2. Introduction • Often searches are done on multiple features • Each feature produces a different ranking for the query • Must thus join and aggregate rankings on different features

  3. Example • Find location for a house such that the combination of the cost of the house and 5 years tuition at a nearby school is minimal. • Exact location is not predefined in query, per location the house and school features would have to be analyzed.

  4. Motivation • Current techniques decouple join and sorting (ranking) of results. • Sorting is expensive and is a blocking operation. • More apparent if ranking and the joining features are different.

  5. Rank-Join Algorithm • Generate new valid join combinations • Compute score for each combination • For each incoming input, calculate the total score of: • The last seen feature value and the top ranked feature value for all other features in the query. • Store the maximum of these as T (threshold) • Store top k in priority queue. • Halt when lowest value of queue ≥ T

  6. Optimality • Is Instance Optimal over all correct top-K join algorithms. • Guarantees that cost of Rank-Join is O (cost of any other algorithm). • Mathematically: • Cost(Rank-Join) ≤ c*Cost(Any Other Algorithm) + c’ • c is the optimality ratio • c, c’ > 0

  7. Rank-Join Continued … • Join strategy crucial • Recommended: Ripple Join • Alternates between tuples • Flexible in the way it sweeps out (rectangular, etc) • Retains ordering in considering samples • Variant of Rank-Join • Hash Rank Join (HRJN) • Block Ripple Join

  8. Hash Rank Join (HRJN) Operator • Built on idea of hash ripple join • Inputs are as two hash tables • Maintains highest (first) and lowest (last selected) objects from each relation. • Results are added to a priority queue • Advantages: • Smaller space requirement • Can be pipelined

  9. Hash Rank Join (HRJN) Operator: Problems • Local Ranking Problem • Results from three or more input streams • Larger queue sizes • More database accesses • Buffer Problem • Cannot predict how many partial joins will result

  10. HRJN Solutions? • Block Ripple Joins • Do comparisons as blocks • Score-Guided Strategy • If thresholds are very different, then this may be because of the way one of the rankings is larger and descends at a slower rate • Can then take more inputs from the slower growing ranking so that the threshold goes closer to the other thresholds

  11. Optimal Join-Order • Try to have the least number of input records in order to get a correct ranking • No clear way of estimating the order of joins • Have a heuristic – Footrule Distance • Simple measure of similarity among two rankings. • First join the most similar rankings • This would quickly yield a join by accessing fewer records

  12. Rank-Join Algorithm: Benefits • What can it do? • Integrates well with query plans • Produces results as fast as possible • Provides performance guarantees • Minimizes space requirements • Offers a mechanism to determine the best order of joining to execute query optimally. • Can be improved further if random access is available • Can eliminate on-the-fly duplicate elimination

  13. References • “Supporting top-k join queries in relational databases” - Ihab Ilyas, Walid Aref, Ahmed Elmagarmid (2004) • Jing Chen : DBIR Spring 2005, CSE-UT Arlington http://ranger.uta.edu/~gdas/Courses/ Spring2005/DBIR/slides/top-k_join.ppt

More Related