320 likes | 336 Views
This study proposes a lattice-based approach for efficient query relaxation in selection queries, providing algorithms that adjust conditions and compute skyline results.
E N D
Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung
Query Example SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; Rares Vernica, UC Irvine
What if the query answer is empty? SELECT * FROM Jobs J, Candidates C WHERE J.Salary <= 95 AND J.Zipcode = C.Zipcode AND C.WorkExp >= 5; Adjust the conditions • What conditions to adjust? • How to adjust them? Rares Vernica, UC Irvine
Example Percentages of Empty Result Queries • In a Customer Relationship Management (CRM) application developed by IBM • 18.07% (3,396 empty result queries in 18,793 queries) • In a real estate application developed by IBM • 5.75% • In a digital library application [JCM+00] • 10.53% • In a bioinformatics application [RCP+98] • 38% Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006 Rares Vernica, UC Irvine
Observations Different ways to adjust the conditions: Select vs. Join How much to adjust each condition?Salary <= 100 vs. Salary <= 120 Adjust join vs. Adjust both selections WorkExp >= 5 Salary <= 95 Rares Vernica, UC Irvine
Contributions Query relaxationframework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms Rares Vernica, UC Irvine
Overview • Motivation • Query Relaxation • Lattice-based Relaxation • Relaxation Algorithms • Variations • Experiments Rares Vernica, UC Irvine
Query Relaxation Top-k / Nearest neighbor • Weight for each condition Skyline • No weights are needed • Conditions are not considered equal • Return non dominated points Rares Vernica, UC Irvine
Skyline Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001 Query Relaxation Rares Vernica, UC Irvine
Overview • Motivation • Query Relaxation • Lattice-based Relaxation • Relaxation Algorithms • Variations • Experiments Rares Vernica, UC Irvine
Lattice-based Relaxation R – select on Jobs J – join condition S – select on Candidates WorkExp >= 5 Salary <= 95 Rares Vernica, UC Irvine
Overview • Motivation • Query Relaxation • Lattice-based Relaxation • Relaxation Algorithms • Variations • Experiments Rares Vernica, UC Irvine
Relaxing Selection Conditions INCORRECT Algorithm: • Compute Skyline on Jobs • Compute Skyline on Candidates • Join the Skylines WorkExp >= 5 Salary <= 95 Empty Join Skyline Skyline Skyline Rares Vernica, UC Irvine
Relaxing Selection Conditions Join First Algorithm: • Compute the join(disregarding the selections) • Compute Skyline on join results WorkExp >= 5 Salary <= 95 Join Skyline Rares Vernica, UC Irvine
Relaxing Selection Condition Variations Pruning Join • Build the Skyline during the join Pruning Join+ • Pruning Join • Build the local Skyline before the join Sorted Access Join • Fagin’s Top-k: sort the columns on relaxation • Compute the join Skyline Rares Vernica, UC Irvine
Relaxing all conditions Queue Skyline Multi-Dim.-Index-based-Relaxation Algorithm: • Traverse the index structure top-down • Form pairs of nodes or records • Build the Skyline Rares Vernica, UC Irvine
Overview • Motivation • Query Relaxation • Lattice-based Relaxation • Relaxation Algorithms • Variations • Experiments Rares Vernica, UC Irvine
Variations Computing Top-k over Skyline • Weight to each condition Queries with multiple joins Conditions on nonnumeric attributes • Dominance checking function Rares Vernica, UC Irvine
Overview • Motivation • Query Relaxation • Lattice-based Relaxation • Relaxation Algorithms • Variations • Experiments Rares Vernica, UC Irvine
Experimental Setting Datasets • Real • Internet Movie Database (IMDB) Movies (120k) & ActorInMovies (1.2m) • Census-Income – UCI KDD Repository Census (200k) • Synthetic Independent, Correlated, and Anticorrelated Implementation • GNU C++ • Spatial Index Library (R-tree) • Linux, AMD Opteron 240, 1GB RAM Rares Vernica, UC Irvine
Different algorithms, different behaviors IMDB Dataset Rares Vernica, UC Irvine
Different datasets, different behaviors Anticorrelated Dataset Correlated Dataset Independent Dataset Rares Vernica, UC Irvine
How big is the Skyline? Rares Vernica, UC Irvine
Relaxing join takes time Self-join on Census Dataset Rares Vernica, UC Irvine
Top-k over Skyline IMDB Dataset Rares Vernica, UC Irvine
Related Work Muslea et al. • Alternate forms of conjunctive expressions Efficient Skyline algorithms • Selection queries Efficient Top-k algorithms • Require weights for conditions Rares Vernica, UC Irvine
Conclusions Query relaxationframework for selections and joins Lattice-based approach for query relaxation Efficient relaxation algorithms Rares Vernica, UC Irvine
Future Work Optimum use of the lattice structure Relax conditions on string attributes Algorithms applicable outside the databases Rares Vernica, UC Irvine
Skyline vs. Top-k Rares Vernica, UC Irvine
Skyline vs. Top-k over Skyline Rares Vernica, UC Irvine