380 likes | 477 Views
Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005 . Outline. Motivation Problem statement Related work and our contributions Proposed algorithm and cost model Experiment design and results
E N D
Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005
Outline • Motivation • Problem statement • Related work and our contributions • Proposed algorithm and cost model • Experiment design and results • Conclusion and future work
Motivation • GIS applications • Find shortest path • Through one point from each of different feature types
A Running Example • Three feature types: • red(g), green(g), black(b) • q is query point • Route with solid red line is shortest route • Routes with dashed lines are other possible routes q
Basic Concepts • <P1,P2,…,Pk> • ordered point sequence and P1,P2,…,Pk are from k different (feature) types of data sets • R(q, P1,P2,…,Pk) • a route from q through points P1,P2,…, and Pk • d(R(q, P1,P2,…,Pk)) • distance of route R(q, P1,P2,…,Pk) • Multi-Type Nearest Neighbor (MTNN) • ordered point sequence <P1’,P2’,…,Pk’> such that d(R(q,P1’,P2’,…,Pk’)) is minimum among all possible routes • d(R(q, P1’,P2’,…,Pk’)) is MTNN distance • MTNN query • A query finding MTNN
Problem Statement for MTNN Query • Given: • A query point • Distance metric • k different (feature) types of spatial objects with data points numbers N1, N2, N3, … ,Nk respectively • R-tree for each data set • Find: Multi-type nearest neighbor (MTNN) • Objective: Minimize length of route from query point covering an instance of each feature • Constraint: • Correctness: The tour should be the shortest path for the query point and the given collection of spatial query feature types • Completeness: Only the shortest path is returned as the query result
Related Work • Optimal sequence route (OSR) query [Kolahdozan et. al. Tech 05-840 USC] • Optimal algorithms (RLORD) • Focus on optimal algorithms for specified permutation of feature types • Point-based algorithms • Trip plan query (TPQ) [Li et. al. SSTD 05] • Heuristic algorithms • Give approximate results
RLORD Example b17 • q is query point • Search order is <r, b, g> • R(q,r2,b2, g2) is greedy route • Radius of circle is d(R(q,r2,b2,g2)) g14 g9 b6 b13 g1 g5 g4 g7 b10 b4 r1 g16 b5 b11 g3 r12 b14 q r3 b3 b8 r10 r2 b9 r4 r11 r9 b2 r14 b15 r13 r6 b12 r7 b1 g2 r5 r8 g13 r15 g12 g10 g8 g1 g6 g11
RLORD Running Iterations • Use backward search strategy O=<g,b,r> • First iteration - examine feature type g • <g2>, <g3>, <g4>, <g5>,<g7>,<g9>, <g10>, <g12>, <g13>, <g14>, <g15>, <g16> in a set R • Second iteration - examine next feature type in O • For every point bi in black set, • iterate on every partial route <gj>in R: • IF d(R(q, bi)) + d(R(bi,gj)) < d(R(q,r2,b2,g2)) • THEN put <bi,gj> into a set R1 • keep ordered sequence <bi,gj> in R1 such that d(R(bi,gj)) + d(R(gj)) is minimum • <b1,g13>, <b2,g2>, <b3,g3>, <b4,g3>, <b6,g14>, <b7,g14>, <b11,g3>, <b12,g13>, <b13,g14>, <b14,g3>, <b15,g13> in a set R2 • R <- R2 • Examine next feature type and repeat above procedure until all types of data are examined
Our Contributions • Formalized a new nearest neighbor search problem – Multi-Type Nearest Neighbor (MTNN) query problem • Proposed a new algorithm, i.e., Page Level Upper Bound (PLUB) based algorithm • Evaluated the proposed algorithm via cost model and experiment
Key Ideas of PLUB • Prune search space at page level • Create candidate leaf page sequences • Search candidate MTNN in these candidate leaf page sequences
Page Level Upper Bound (PLUB) Algorithm • Step 1: First upper bound search • Use basic R-tree based nearest neighbor search algorithm to find an initial upper bound as current upper bound, using greedy strategy • Step 2: R-Tree search • Prune search space with current upper bound and form a set of leaf node candidate sequences, using page level pruning approach • Step 3: Subset search • Search candidate MTNN in leaf node candidate sequences • Go to step 2 until going thought all permutation of feature types, using candidate MTNN distance as current upper bound
RLUB – An Example b17 G3 • Inputs • q: query point • Euclidean distance • R-tree for each feature B4 g14 g9 b6 b13 g1 G4 g5 g7 g4 b10 b4 r1 g16 b5 b11 r12 g3 b14 B2 R2 q r3 B3 b3 b8 r10 r2 b9 r4 R4 R1 r11 r9 b2 r14 b15 r13 B1 r6 b12 r7 b1 r8 g2 r5 g13 r15 R3 g12 G1 g10 g8 g1 g6 • R(q,r2,b2,g2) is greedy route • Radius of circle is d(R(q,r2,b2,g2)) = 3.37 • Rectangles are leaf pages in R-trees g11 G2
RLUB – An Example • Leaf page upper bound calculation (current search bound 3.37) • Only leaf node sequence <R1,B1,G1> left b17 G3 B4 g14 g9 b6 b13 g1 G4 g5 g7 g4 b10 b4 r1 g16 b5 b11 r12 g3 b14 B2 R2 q r3 B3 b3 b8 r10 r2 b9 r4 R4 R1 r11 r9 b2 r14 b15 r13 B1 r6 b12 r7 b1 r8 g2 r5 g13 r15 R3 g12 G1 g10 g8 g1 g6 • R(q,r2,b2,g2) is greedy route • Radius of circle is d(R(q,r2,b2,g2)) = 3.37 • Rectangles are leaf pages in R-trees g11 G2
RLUB – An Example b17 G3 • Search candidate MTNN in <R1,B1,G1>(time unit p-p) • 1st iteration • <g2><g10><g12> <g13> • Time 4 • 2nd iteration • <b12,g13,><b1,g13> <b2,g2><b15,g13> • Time 4x4+4=20 • 3rd iteration • <r10,b15,g13,><r9,b15,g13><r2,b2,g2> <r11,b1,g13> • Time 4x4+4=20 • Output • Shortest distance route R(q,r11,b1,g13) and distance value 3.16 B4 g14 g9 b6 b13 g1 G4 g5 g7 g4 b10 b4 r1 g16 b5 b11 r12 g3 b14 B2 R2 q r3 B3 b3 b8 r10 r2 b9 r4 R4 R1 r11 r9 b2 r14 b15 r13 B1 r6 b12 r7 b1 r8 g2 r5 g13 r15 R3 g12 G1 g10 g8 g1 g6 • R(q,r2,b2,g2) is greedy route • Radius of circle is d(R(q,r2,b2,g2)) = 3.37 • Rectangles are leaf pages in R-trees g11 G2
Running Results of RLORD • First iteration (time unit p-p) • <g2>, <g3>, <g4>, <g5>,<g7>,<g9>, <g10>, <g12>, <g13>, <g14>, <g15>, <g16> • Time 11 • Second iteration • <b1,g13>, <b2,g2>, <b3,g3>, <b4,g3>, <b6,g14>, <b7,g14>, <b11,g3>, <b12,g13>, <b13,g14>, <b14,g3>, <b15,g13> • Time 11x12+12=144 • Third iteration • <r1,b11,g3>, <r2,b2,g2>, <r3,b11,g3>, <r8,b1,g13>, <r9,b15,g13>, <r10,b15,g13>, <r11,b1,g13>, <r12,b11,g3>, <r13,b1,g13>, <r14,b1,g13>, <r15,b1,g13> • Time 12x11+11=143 • R(q,r11,b1,g13) is shortest among all routes • Shortest distance value 3.16
Running Time Comparison Table • R-R: rectangle to rectangle distance • P-P: point to point distance • RLORD has no R-R distance calculation, but has much more P-P calculation • Cost of R-R < 2 x cost of P-P
Cost Model for PLUB (For One Permutation) • CR-T + CLF + CPN • CR-T : cost of R-tree traversal to find all R-tree leaf nodes intersected by the circle with radius of current upper bound, centered at query point q • CLF : cost of page level leaf node search for R-tree candidate leaf node sequences • CPN : cost of point level search for candidate MTNN in candidate leaf node sequences
CR-T Model of PLUB • CR-T : R-tree traversal cost • CPR :cost of point to rectangle distance calculation • N t,i : number of all the tree nodes visited in feature type i tree traversal • CR-T = CPR x Σ N t,i (i= 1, …, k)
CLF Model of PLUB • CLF: search of R-tree candidate leaf node sequences • NR-R : Number of leaf nodes visited in candidate leaf node sequences search • CR-R : cost of rectangle to rectangle distance calculation • CLF = NR-R x CR-R
CPN Model of PLUB • CPN : search MTNN in candidate leaf node sequences • FLS : leaf node candidate sequence filtering ability ratio • nl : average point number in leaf node for all feature types • pi : page number of feature type i • CP-P :cost of point to point distance calculation • Cls : cost of search MTNN in single leaf node sequence • Cls = CP-P x (nl +(nl x nl) + nl + (nl x nl) + … + nl + (nl x nl) (k-1 items) = (k-1) (nl x (nl +1)) x CPP • CPN = Cls x Πpi x (1- FLS) i = 1,…,k
Cost Model for R-Lord (For One Permutation) • CR-T‘+ CPS • CR-T‘: cost of R-tree based coarse pruning, i.e. find all data points inside initial upper bound • CR-T‘ = CR-T + CP-P x nl x (p1+ p2 +p3 +…+ pk-1+ pk ) • CPS : cost of candidate MTNN search in remaining subsets • CP-P :cost of point to point distance calculation • CPS = CP-P x nl x (p1 + nl x p1xp2 + (p2+ nl x p2xp3 )+ …+ (pk-1+ nl x pk-1 x pk )
Cost Model Summary of PLUB and RLORD( one permutation) • In random or approximate random datasets, FLS is not big enough, PLUB takes more time. • In clustered datasets, FLS tends to be very big. When 1-FLS <(nl x (p1 + nl x p1xp2 +(p2+ nl x p2xp3 )+…+ (pk-1+ nl x pk-1 x pk ))) /((k-1) nl x (nl +1) x Π pi ) PLUB runs faster than RLORD • For clustered datasets, it becomes true when clusters becomes more compact • Left side: remaining ratio (r-ratio) • Right side: comparison ratio (c-ratio)
Synthetic Data Sets Generation • Randomly generate cluster center in rectangle with bottom-left (0,0) and top-right point (10000,10000) • Constraint: the minimum distance between two cluster centers is minCCDist • Around every cluster center, generate cluster member points • Maximum distance from member point to cluster center is ClusterSize • Simplified maximum cluster center distance is determined by: • maxCCDist = 10000.0/(int)(sqrt(CN)+1) • Thus minimum cluster center distance when generating cluster center is as follows: • minCCDist = BCF x maxCCDist • Then the cluster size is: • ClusterSize = ICF x minCCDist
Experiment Parameters • Feature Types:2-7 • Between-cluster Compactness Factor (BCF): 0.1-1.0 • In-cluster Compactness Factor (ICF):0.1-0.5 • Cluster Number(CN):20,50,100,200
Synthetic Datasets Example • BCF=0.5,ICF=0.5,CN=20,Feature Type=2 • BCF=0.5,ICF=0.3,CN=20,Feature Type=2
Experiment Setup & Data Sets • Setup • C / Pentium-IV 3.2GHz / Linux / 1GB Memory / Synthetic data • Synthetic data • Scalability test in terms feature types • Effect of data sets density • Effect of Between-cluster compactness factor • Effect of In-cluster compactness factor
Scalability Test • Parameters • Fixed: BCF=0.1, ICF = 0.1, CN=20 • Variable: feature types (2-7) • Trend • PLUB is much faster when number of features is high
Effect of Data Sets Density • Parameters • Fixed: FT = 7, BCF=0.1, ICF=0.5 • Variable: cluster number (20,50,100,200) • Trend • PLUB is always faster than RLORD for all densities of data sets
Effect of Between-cluster Compactness Factor • Parameters • Fixed: FT = 7, ICF=0.3,CN=50, • Variable: BCF (0.1-1.0)
Effect of Between-cluster Compactness Factor • Top: execution time v.s. BCF • Trend • PLUB is faster than RLORD when BCF is less than 0.7 • PLUB is slower than RLORD when BCF is bigger than 0.7
Effect of Between-cluster Compactness Factor • Bottom: Remaining ratio (r-ratio) and comparison ratio (c-ratio) v.s. BCF • Trend • Ratios increase as BCF increase • Remaining ratio is less than comparison ratio when BCF is less than 0.8
Effect of Between-cluster Compactness Factor • Contradiction? • Remaining ratio increases, which means the pruning ratio decreases, the execution time decreases • when BCF increases, there are less leaf nodes intersected with current search bound. Thus the total possible candidate leaf node sequences decrease dramatically
Effect of Between-cluster Compactness Factor • Key information • when remaining ratio is less than comparison ratio, PLUB runs faster • when remaining ratio is greater than comparison ratio, PLUB takes more time than RLORD.
Effect of In-cluster Compactness Factor • Parameters • Fixed: FT = 7, BCF=0.1,CN=50, • Variable: ICF (0.1-0.5) • Trend • PLUB is always faster than RLORD for ICF from 0.1 to 0.5
Conclusion and Future Work • Formalized MTNN query problem • Proposed PLUB based algorithm for MTNN query • Compared PLUB and RLORD • Design heuristic algorithms to tackle MTNN query problem in large number of feature types
References • [1] M. Kolahdouzan, M. Sharifzadeh and C. Shahabi. The Optimal Sequenced Route Query. IN USC, CS Dept, Tech. Report 05-840, 2005 • [2] Feifei Li, Dihan Cheng, Marios Hadjieleftherious, George Kollios and Shang-Hua Teng. On Trip Planning Queries in Spatial Databases. SSTD 2005.