400 likes | 530 Views
Skyline Query Processing for Incomplete Data. Mohamed E. Khalefa Mohamed F. Mokbel Jus tin J. Levandoski Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA ICDE 2008. Outline. Introduction Problem Formulation Methods and Algorithms
E N D
Skyline Query Processing for Incomplete Data Mohamed E. Khalefa Mohamed F. MokbelJus tin J. Levandoski Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA ICDE 2008
Outline • Introduction • Problem Formulation • Methods and Algorithms • Experiment Results • Conclusion
Introduction • Existing skyline algorithms assume: 1. Date are complete (all dimensions are available for all data ) 2.Transitive relation. p1 dominates p2, p2 dominates p3 => p3 dominates p1.
(Cont.) • If data is incomplete: 1.Some dimensions are no value. 2.No transitive relation. p1 dominates p2, p2 dominates p3. But p1 don’t dominates p3. p3 dominates p1. Cycle and no transitive relation!!
Problem Formulation • Dominance Relation for Incompletedata: 1.There is at least one dimension ui where both P.ui and Q.ui are known, and P.ui > Q.ui . 2.For all other dimensions j, j ≠i, either P.uj is unknown, Q.uj is unknown, or P.uj ≥Q.uj. • Example: p1dominates p2. p2 don’t domninate p3, and p3 don’t domninatep2.
(Cont.) • Bitmap representation:0: unknown dimension 1:know dimension example: p1.B and p2.B=100<-comparable p1.B and p3.B=000<-incomparable
Methods and Algorithms • The Replacement Algorithm. • The Bucket Algorithm. • The ISkylineAlgorithm.
The Replacement Algorithm • Replace unknown dimension by . • Use traditional Skyline algorithm to get Ssky Replace Replace “–” Incomplete Data Complete Data Ssky Ssky
The Bucket Algorithm • To divide all incoming points into distinct buckets where all points in each bucket have the same bitmap representation. • Skylines of each bucket: local skyline. • Collect all local skyline in one list, termed candidate skyline. • Perform an exhaustive pairwise comparison among all points to get the query answer.
(Cont.) Global Skyline Candidate Skyline Local Skyline 4 1
(Cont.) • In general, performance is better than the replacement algorithm because candidate list is likely to be smaller than set Ssky in the replacement algorithm. • Candidate skylines may be excessive size • Missing a chance to use the bucket data to reduce the comparisons
The ISkyline Algorithm • Virtual Points • Shadow Points • The ISkyline Algorithm
Virtual Points P1,P2,Q1,P3,P4依序進入
Shadow Points Q1dominates P3=> add virtual point Q1v to P’s local_skyline Q4 is dominated by P3. But we just check “local skyline”. Q4 don’t be dominated.
(Cont.) • Shadow Points: points that are only dominated by virtual points. Q1 is dominated by S4v. Q3 is dominated by S4v.
The ISkylineAlgorithm • Phase I:Insert P, 1.If P is dominated by real point in Local Skyline=>Remoed P. 2.If P is dominated by virtual point in Local skyline =>Insert to shadow skyline point. 3.If P is local skyline point=>Insert to the Candidate skyline.(Phase II) • Phase II:the number of the Candidate skyline>t=>Insert to the global skyline
(Cont.) t=2 P1(6,4,-) Global skyline Candidate skyline P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011
(Cont.) t=2 P1(6,4,-) Global skyline Candidate skyline P1 P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011
(Cont.) t=2 P1(6,4,-) Global skyline Candidate skyline P1 P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011
(Cont.) t=2 P1(6,4,-) Q1(8,-,1) Global skyline Candidate skyline P1 Q1 P1(6,4,-) Q1(9,-,1) Node P = 110 Node Q= 101 Node R= 011
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) Global skyline Candidate skyline P1 Q1 P1(6,4,-) Q1(9,-,1) Node P = 110 Node Q= 101 Node R= 011
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) Global skyline Candidate skyline P1 Q1 Q1v(9,-,-) Q1(9,-,1) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) Global skyline Candidate skyline Q1 R1 Q1v(9,-,-) Q1(9,-,1) R1(-,3,1) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Global skyline Candidate skyline Q1 R1 Q1v(9,-,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Global skyline Candidate skyline Q1 R1 P2 Q1v(9,-,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) |Candidate skyline|>2 Insert to Global skyline P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) Compare against Shadow skyline t=2 P1(6,4,-) Q1(8,-,1) R1(-,3,1) P2(9,3,-) Global skyline Q1 R1 P2 Candidate skyline Q1v(8,-,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) R1 is dominated by P1 t=2 P1(6,4,-) Q1(8,-,1) R1(-,3,1) P2(9,3,-) Global skyline Q1 R1 P2 Candidate skyline Q1v(8,-,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Global skyline Q1 P2 Candidate skyline Q1v(9,-,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) Global skyline Q1 P2 Candidate skyline Q1v(9,-,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) Global skyline Q1 P2 Candidate skyline Q1v(9,-,-) Q1(9,-,1) R1(-,3,1) Q2(6,-,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) Global skyline Q1 P2 Candidate skyline R1(-,3,1) Q1v(9,-,-) Q1(9,-,1) Q2(6,-,1) R2(-,6,5) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) Global skyline Q1 P2 Candidate skyline Q1v(9,-,-) Q1(9,-,1) R2(-,6,5) Q2(6,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 R2 dominates R1 P1(6,4,-) Shadow skyline
(Cont.) Check Candidate skyline and Global skyline t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) Global skyline Q1 P2 Candidate skyline R2 Q1v(9,-,-) Q1(9,-,1) R2(-,6,5) Q2(6,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) Q1 and P2 are dominated by R2 t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) Global skyline Q1 P2 Candidate skyline R2 Q1v(9,-,-) Q1(9,-,1) R2(-,6,5) Q2(6,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) Global skyline: Global skyline Candidate skyline t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) Global skyline Candidate skyline R2 Q1v(9,-,-) Q1(9,-,1) R2(-,6,5) Q2(6,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
(Cont.) Result is Global skyline:Q2 t=2 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) Global skyline R2 Candidate skyline Q1v(9,-,-) Q1(9,-,1) R2(-,6,5) Q2(6,-,1) R1(-,3,1) P2(9,3,-) P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline
Conclusion • Base on traditional skyline Query: the Replacement Algorithm and the Bucket Algorithm. • New method: the ISkylineAlgorithm. • The performance of the ISkylineAlgorithm is the best of three.