130 likes | 253 Views
Indexing Multidimensional Data : A Mapping Based Approach. Rui Zhang. Outline. Backgrounds Multidimensional data and queries Mapping based multidimensional indexing and query processing General strategy Window queries K nearest neighbor ( KNN ) queries Summary and future work.
E N D
Indexing Multidimensional Data :A Mapping Based Approach Rui Zhang
Outline • Backgrounds • Multidimensional data and queries • Mapping based multidimensional indexing and query processing • General strategy • Window queries • K nearest neighbor (KNN) queries • Summary and future work
Multidimensional Data (low-dimensionality) • Spatial data • Geographic Information: Melbourne (37, 145) • Which city is at (30, 140)? • Computer Aided Design: width and height (40, 50) • Any part that has a width of 40 and height of 50? • Records with multiple attributes • Employee (ID, age, score, salary, …) • Is there any employee whose age is under 25 and performance score is greater than 80 andsalary is between 3000 and 5000 • Multimedia data • Color histograms of images • Give me the most similar image to • Multimedia Features: color, shape, texture (medium-dimensionality) (high-dimensionality)
Multidimensional Queries • Point query • Return the objects located at Q(x1, x2, …, xd). • E.g. Q=(3.4, 6.6). • Window query • Return all the objects enclosed or intersected by the hyper-rectangle W{[L1, U1], [L2, U2], …, [Ld, Ud]}. • E.g. W={[0,4],[2,5]} • K-Nearest Neighbor Query (KNN Query) • Return k objects whose distances to Q are no larger than any other object’ distance to Q. • E.g. 3NN of Q=(4,1)
Mapping Based Multidimensional Indexing Sort • Story • The CBD:[0,4][2,5] • Blocks in the CBD are:[8,15], [32,33]and[36,37] • General strategy: three steps • Data mapping and indexing • Query mapping and data retrieval • Filtering out false positive
Another mapping example Sort • Story continued • The CBD:[0,4][2,5] • Streets intersected by the CBD are:[11,14], [21,22]and41 • The Pyramid-tree [SIGMOD’98] • Data space divided into 2d pyramids • Streets are parallel to the base of the pyramid • Data mapping • Objects mapped to the street numbers • Query mapping • Query window mapped to all the intersected streets
Deficiency of the Pyramid-tree • Sensitivity to location of query window Magic of mapping • A set of d functions: t1 , t2 , …, td ; ti satisfies that: • bijection from [0,1] to [0,1] • monotonic • ti (ci) = 0.5 • Apply ti to the query, so that: • The answers of the transformed queries over the transformed data are the answers of the original query over the original data. Ci=0.25 Ci=0.707
The P+-tree [ICDE’04] • Two measures • Space division • Mapping the data • Performance
2 1 3 Mapping for KNN Queries Sort 24 23 22 21 • Story continued • New factory atQ[4,1] • Find 3 nearest buildings to Q • Termination condition • K candidates • All in the current search circle 14 4 13 3 12 2 32 11 1 31 Q R = 1.75 R = 0.35 R = 0.70 R = 1.05 R = 1.40 R = 2.10 ||CQ|| = 1.84 ||DQ|| = 2.05 ||BQ|| = 1.81 ||FQ|| = 3.62 ||AQ|| = 3.31 ||EQ|| = 3.00
The iDistance[TODS’05a] • Data partitioned into a number of clusters • Streets are concentric circles • Data mapping • Objects mapped to street numbers • Query mapping • Search circle mapped to streets intersected • Performance
Summary • Summary • P+-tree for Window Queries[ICDE’04] • iDistance for kNN Queries[TODS’05a] • A function for mapping data and queries. Efficiency lie in the design of the mapping function • Generalized Multidimensional Data Mapping and Query Processing[TODS’05b]
Recent work and Trend • Queries on moving objects, continuous queries • Predictive range and knn queries[InfSys’10] • Continuous retrieval of 3D objects[ICDE’08b, VLDBJ’10b] • Continuous intersection join [ICDE’08a, VLDBJ’12] • Continuous knn join [GeoInformatica’10] • (Continuous) Moving knn queries [VLDB’08b, VLDBJ’10a, InfSys’13] • Other types of incremental queries [TKDE’10] • Temporal queries • Version index with compression [VLDB’08a] • Memory hierarchy friendly index, HV-tree [VLDB’10]
References • [SIGMOD’98] Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. ACM SIGMOD International Conference on Management of Data (SIGMOD) 1998. • [ICDE’04] Rui Zhang, Beng Chin Ooi, Kian-Lee Tan. Making the Pyramid Technique Robust to Query Types and Workloads. IEEE International Conference on Data Engineering (ICDE) 2004. • [TODS’05a] H.V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search. ACM Transactions on Data Base Systems (TODS), 30(2), 2005. • [TODS’05b] Rui Zhang, PanosKalnis, Beng Chin Ooi, Kian-Lee Tan. Generalized Multi-dimensional Data Mapping and Query Processing. ACM Transactions on Data Base Systems (TODS), 30(3), 2005. • [VLDB’00] Frank Ramsak, Volker Markl, Robert Fenk, Martin Zirkel, Klaus Elhardt, Rudolf Bayer. Integrating the UB-Tree into a Database System Kernel. International Conference on Very Large Data Bases (VLDB) 2000. • [PODS’00] Beng Chin Ooi, Kian-Lee Tan, Cui Yu, StéphaneBressan. Indexing the Edges - A Simple and Yet Efficient Approach to High-Dimensional Indexing. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) 2000. • [ICDE’08a] Rui Zhang, Dan Lin, KotagiriRamamohanarao, Elisa Bertino. Continuous Intersection Joins over Moving Objects. Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 863-872, April 7-12, 2008. • [ICDE’08b] Mohammed Eunus Ali, Rui Zhang, EgemenTanin, Lars Kulik. A Motion-Aware Approach to Continuous Retrieval of 3D Objects. Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 843-852, April 7-12, 2008. • [VLDB’08a] David Lomet, Mingsheng Hong, RimmaNehme, Rui Zhang: Transaction Time Indexing with Version Compression. Proceedings of the VLDB Endowment (PVLDB), 1(1), 870-881, 2008. • [VLDB’08b] SaranaNutanong, Rui Zhang, EgemenTanin, Lars Kulik: The V*-Diagram: A Query Dependent Approach to Moving KNN Queries. Proceedings of the VLDB Endowment (PVLDB), 1(1), 1095-1106, 2008. • [GeoInformatica’10] Cui Yu, Rui Zhang, Yaochun Huang, HuiXiong: High-dimensional kNN joins with incremental updates. GeoInformatica, 1 (14), 55-82, 2010. • [VLDB’10] Rui Zhang, Martin Stradling. The HV-tree: a Memory Hierarchy Aware Version Index. Proceedings of the VLDB Endowment (PVLDB), 3(1), 397-408, 2010. • [VLDBJ’10a] SaranaNutanong, Rui Zhang, EgemenTanin, Lars Kulik. Analysis and Evaluation of V*-kNN: An Efficient Algorithm for Moving kNN Queries. VLDB Journal, 19(3): 307-332, 2010. • [VLDBJ’10b] Mohammed Eunus Ali, EgemenTanin, Rui Zhang, Lars Kulik. A Motion-Aware Approach for Efficient Evaluation of Continuous Queries on 3D Object Databases. VLDB Journal, 19(5): 603-632, 2010. • [TKDE’10] SaranaNutanong, EgemenTanin, Rui Zhang. Incremental Evaluation of Visible Nearest Neighbor Queries. IEEE Transactions on Knowledge & Data Engineering (TKDE), 22(5): 665-681, 2010. • [InfSys’10] Rui Zhang, H. V. Jagadish, Bing Tian Dai, KotagiriRamamohanarao. Optimized Algorithms for Predictive Range and KNN Queries on Moving Objects. Information Systems, 35(8): 911-932, 2010. • [VLDBJ’12] Rui Zhang, JianzhongQi, Dan Lin, Wei Wang, Raymond Chi-Wing Wong. A Highly Optimized Algorithm for Continuous Intersection Join Queries over Moving Objects. VLDB Journal, 21(4): 561-586, 2012. • [VLDBJ’13] TanzimaHashem, Lars Kulik, Rui Zhang. Countering Overlapping Rectangle Privacy Attack for Moving kNN Queries. Information Systems. 38(3): 430-453, 2013.