330 likes | 454 Views
Creating Competitive Products. Qian Wan [1] , Raymond Chi-Wing Wong [1] , Ihab F. Ilyas [2] , M. Tamer Ozsu [2] , Yu Peng [1] [1] Hong Kong University of Science and Technology [2] University of Waterloo Presented by Qian Wan Prepared by Qian Wan. Outline. Background
E N D
Creating Competitive Products Qian Wan[1], Raymond Chi-Wing Wong[1], Ihab F. Ilyas[2], M. Tamer Ozsu[2], Yu Peng[1] [1] Hong Kong University of Science and Technology [2] University of Waterloo Presented by Qian Wan Prepared by Qian Wan
Outline • Background • Skyline, Related Work • Motivation • Examples,Problem Definition • Algorithm • Framework, Grouping, Pruning • Experiments • Synthetic, Real data • 6 factors • Conclusions
Skyline • Definition • Skyline contains the points which are not dominated by others • Hotel searching problem • Distance to beach VS Price • Dominance • Skyline H1 H2 Dist Dist H3 H4 H6 H5 H7 H8 H2 H9 H1 Price Price
Related Work • Skyline Queries in DBMS [S.Borzsonyi, 2001] • Single Table Skyline Queries • Bitmaps[K.L. Tan,2001], Nearest Neighbor[D.Kossomann, 2002], Branch and Bound Skylines[D.Papadias, 2005] • Multi-Table Skyline Queries • Natural Join [W.Jin, 2007][D.Sun, 2008] • Our Work • Join different source tables via a “Cartesian product” like procedure.
Outline • Background • Skyline, Related Work • Motivation • Examples,Problem Definition • Algorithm • Framework, Grouping, Pruning • Experiments • Synthetic, Real data • 6 factors • Conclusions
A Travel Agency’s Database Existing Vacation Packages Newly Created Vacation Packages Skyline tuples Direct attributes Indirect attributes One indirect attribute characteristic e.g. Travel Agency (Price), PC Manufacture(Price) and Logistic Transportation Service (Price) Source Tables
Finding Competitive Products • Given a set of source tables • Market packages • New packages • Then, a tuple q in TQ is said to be competitive product if q is in Skyline with respect to
Naïve Solution Intra-dominance checking Inter-dominance checking Existing Vacation Packages Newly Created Vacation Packages Competitive Products Source Tables
Outline • Background • Skyline, Related Work • Motivation • Examples,Problem Definition • Algorithm • Framework, Grouping, Pruning • Experiments • Synthetic, Real data • 6 factors • Conclusions
Algorithm Overview • Intra-dominance checking (Framework) • To Find Skyline in Source Tables • Inter-dominance checking • Skyline in Existing Market Packages • R* Tree Indies in Existing Market Packages • Full Pruning • Partial Pruning • Post-processing
Intra-dominance Checking NO intra-dominance checking(one indirect attribute) NO competitive products are missing Competitive Products Newly Created Vacation Packages Skyline Tuples of Source Tables
Algorithm Overview • Intra-dominance checking (Framework) • To Find Skyline in Source Tables • Inter-dominance checking • Skyline in Existing Market Packages • R* Tree Indies in Existing Market Packages • Full Pruning • Partial Pruning • Post-processing
Inter-dominance Checking Inter-dominance Checking Range query Existing Vacation Packages Skyline in Existing Vacation Packages No Missing Competitive Products R* Tree will speedup the inter-dominance checking
Algorithm Overview • Intra-dominance checking (Framework) • To Find Skyline in Source Tables • Inter-dominance checking • Skyline in Existing Market Packages • R* Tree Indies in Existing Market Packages • Full Pruning • Partial Pruning • Post-processing
Grouping Full Pruning Existing Vacation Packages A1 C1={A1, B1} A2 C4={A2, B2} B1 B2 Newly Created Vacation Packages Skyline Tuples of Source Tables Competitive Products
Full Pruning Best Representative Quality of Best Representative: tightness of each group (Clustering, e.g. KMeans)
Algorithm Overview • Intra-dominance checking (Framework) • To Find Skyline in Source Tables • Inter-dominance checking • Skyline in Existing Market Packages • R* Tree Indies in Existing Market Packages • Full Pruning • Partial Pruning • Post-processing
Partial Pruning • Partial Pruning • Full pruning prunes all members in the group • Partial pruning prunes some members in the group • Partial pruning is used when full pruning cannot be applied • Idea • Direct attribute does not change • Estimate the best possible value for indirect attributes • Eliminate a combination , if • It is dominated on all direct attributes • It is dominated on all indirect attributes according to their best estimation
Algorithm Overview • Framework • Intra-dominance checking • To Find Skyline in Source Tables • Inter-dominance checking • Skyline in Existing Market Packages • R* Tree Indies in Existing Market Packages • Full Pruning • Partial Pruning • Post-processing
Post-processing • More than one indirect attributes • Calculation • Previous algorithm Intra-dominance checking • Any existing Skyline algorithm • Post-processing cost depends on the size of Competitive Products
Outline • Background • Skyline, Related Work • Motivation • Examples,Problem Definition • Algorithm • Framework, Grouping, Pruning • Experiments • Synthetic, Real data • 6 factors • Conclusions
Experiments • Pentium IV 2.4GHz PC with 4GB memory, Linux platform, C++ • Synthetic anti-correlated datasets • Real datasets, Travel Agency A and Travel Agency B • A, 296 packages, 1014 hotels and 4394 flights • B, 149 packages, 995 hotels and 866 flights • Implementation • Algorithm for Creating Competitive Products (ACCP) • Baseline algorithm • Naïve algorithm
Synthetic Datasets • Schema is the same as example • Anti-correlated • 6 factors • Measurement • Execution time • Pruning Power • Ratio of Competitive Products out of all combinations • Memory Usage
Experiments TQ, TQ’, and TR From 100k to 500k Full pruning & partial pruning Pruning Power slightly increases
Outline • Background • Skyline • Motivation • Examples &Problem Definition • Algorithm • Framework, Partition, Pruning • Experiments • On both synthetic and real data • Over 6 factors • Conclusions
Conclusions • Creating Competitive Products • Example • Problem Definition • Algorithms • Framework • Intra-dominance checking • Inter-dominance checking • Post-processing • Experiments • Synthetic anti-correlated datasets • Real datasets
Q&A Thank You !
Partial Pruning Full Pruning Existing Vacation Packages C1={A1, B1} A1 B1 Newly Created Vacation Packages Skyline Tuples of Source Tables Competitive Products
Meta Transformation A1 B1 Meta-Flight Meta-Hotel • No inter-dominance checking for {F2} X{H2}
Experiments From 2.5M to 10M More competitive Slightly decreases
Experiments DOM SKY • A, 296 packages, 1014 hotels and 4394 flights . B, 149 packages, 995 hotels and 866 flights • Source tables from B, and Package from A • Vary discount from 0 to 0.50 • Efficiency • ACCP(44.74s) and Baseline (84.47s) • |SKY|/|TQ| • |DOM|/|TE| Travel Agency A Package Generation Set