130 likes | 141 Views
This book covers multidimensional data indexing, mapping techniques, query processing strategies, and future trends. It delves into spatial data, multimedia features, and multidimensional queries, including point, window, and k-nearest neighbor queries. The methodology involves mapping data for efficient retrieval, query transformation, and false positive filtering to enhance performance. The content also discusses advanced techniques like the Pyramid-tree, P+-tree, and iDistance for improved query processing and data mapping. Recent advancements in multidimensional data processing, such as continuous queries and predictive range queries, are also highlighted. References to influential works in the field are provided for further exploration.
E N D
Indexing Multidimensional Data :A Mapping Based Approach Rui Zhang
Outline • Backgrounds • Multidimensional data and queries • Mapping based multidimensional indexing and query processing • General strategy • Window queries • K nearest neighbor (KNN) queries • Summary and future work
Multidimensional Data (low-dimensionality) • Spatial data • Geographic Information: Melbourne (37, 145) • Which city is at (30, 140)? • Computer Aided Design: width and height (40, 50) • Any part that has a width of 40 and height of 50? • Records with multiple attributes • Employee (ID, age, score, salary, …) • Is there any employee whose age is under 25 and performance score is greater than 80 andsalary is between 3000 and 5000 • Multimedia data • Color histograms of images • Give me the most similar image to • Multimedia Features: color, shape, texture (medium-dimensionality) (high-dimensionality)
Multidimensional Queries • Point query • Return the objects located at Q(x1, x2, …, xd). • E.g. Q=(3.4, 6.6). • Window query • Return all the objects enclosed or intersected by the hyper-rectangle W{[L1, U1], [L2, U2], …, [Ld, Ud]}. • E.g. W={[0,4],[2,5]} • K-Nearest Neighbor Query (KNN Query) • Return k objects whose distances to Q are no larger than any other object’ distance to Q. • E.g. 3NN of Q=(4,1)
Mapping Based Multidimensional Indexing Sort • Story • The CBD:[0,4][2,5] • Blocks in the CBD are:[8,15], [32,33]and[36,37] • General strategy: three steps • Data mapping and indexing • Query mapping and data retrieval • Filtering out false positive
Another mapping example Sort • Story continued • The CBD:[0,4][2,5] • Streets intersected by the CBD are:[11,14], [21,22]and41 • The Pyramid-tree [SIGMOD’98] • Data space divided into 2d pyramids • Streets are parallel to the base of the pyramid • Data mapping • Objects mapped to the street numbers • Query mapping • Query window mapped to all the intersected streets
Deficiency of the Pyramid-tree • Sensitivity to location of query window Magic of mapping • A set of d functions: t1 , t2 , …, td ; ti satisfies that: • bijection from [0,1] to [0,1] • monotonic • ti (ci) = 0.5 • Apply ti to the query, so that: • The answers of the transformed queries over the transformed data are the answers of the original query over the original data. Ci=0.25 Ci=0.707
The P+-tree [ICDE’04] • Two measures • Space division • Mapping the data • Performance
2 1 3 Mapping for KNN Queries Sort 24 23 22 21 • Story continued • New factory atQ[4,1] • Find 3 nearest buildings to Q • Termination condition • K candidates • All in the current search circle 14 4 13 3 12 2 32 11 1 31 Q R = 1.75 R = 0.35 R = 0.70 R = 1.05 R = 1.40 R = 2.10 ||CQ|| = 1.84 ||DQ|| = 2.05 ||BQ|| = 1.81 ||FQ|| = 3.62 ||AQ|| = 3.31 ||EQ|| = 3.00
The iDistance[TODS’05a] • Data partitioned into a number of clusters • Streets are concentric circles • Data mapping • Objects mapped to street numbers • Query mapping • Search circle mapped to streets intersected • Performance
Summary • Summary • P+-tree for Window Queries[ICDE’04] • iDistance for kNN Queries[TODS’05a] • A function for mapping data and queries. Efficiency lie in the design of the mapping function • Generalized Multidimensional Data Mapping and Query Processing[TODS’05b]
Recent work and Trend • Queries on moving objects, continuous queries • Predictive range and knn queries[InfSys’10] • Continuous retrieval of 3D objects[ICDE’08b, VLDBJ’10b] • Continuous intersection join [ICDE’08a, VLDBJ’12] • Continuous knn join [GeoInformatica’10] • (Continuous) Moving knn queries [VLDB’08b, VLDBJ’10a, InfSys’13] • Other types of incremental queries [TKDE’10] • Temporal queries • Version index with compression [VLDB’08a] • Memory hierarchy friendly index, HV-tree [VLDB’10]
References • [SIGMOD’98] Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. ACM SIGMOD International Conference on Management of Data (SIGMOD) 1998. • [ICDE’04] Rui Zhang, Beng Chin Ooi, Kian-Lee Tan. Making the Pyramid Technique Robust to Query Types and Workloads. IEEE International Conference on Data Engineering (ICDE) 2004. • [TODS’05a] H.V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search. ACM Transactions on Data Base Systems (TODS), 30(2), 2005. • [TODS’05b] Rui Zhang, PanosKalnis, Beng Chin Ooi, Kian-Lee Tan. Generalized Multi-dimensional Data Mapping and Query Processing. ACM Transactions on Data Base Systems (TODS), 30(3), 2005. • [VLDB’00] Frank Ramsak, Volker Markl, Robert Fenk, Martin Zirkel, Klaus Elhardt, Rudolf Bayer. Integrating the UB-Tree into a Database System Kernel. International Conference on Very Large Data Bases (VLDB) 2000. • [PODS’00] Beng Chin Ooi, Kian-Lee Tan, Cui Yu, StéphaneBressan. Indexing the Edges - A Simple and Yet Efficient Approach to High-Dimensional Indexing. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) 2000. • [ICDE’08a] Rui Zhang, Dan Lin, KotagiriRamamohanarao, Elisa Bertino. Continuous Intersection Joins over Moving Objects. Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 863-872, April 7-12, 2008. • [ICDE’08b] Mohammed Eunus Ali, Rui Zhang, EgemenTanin, Lars Kulik. A Motion-Aware Approach to Continuous Retrieval of 3D Objects. Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 843-852, April 7-12, 2008. • [VLDB’08a] David Lomet, Mingsheng Hong, RimmaNehme, Rui Zhang: Transaction Time Indexing with Version Compression. Proceedings of the VLDB Endowment (PVLDB), 1(1), 870-881, 2008. • [VLDB’08b] SaranaNutanong, Rui Zhang, EgemenTanin, Lars Kulik: The V*-Diagram: A Query Dependent Approach to Moving KNN Queries. Proceedings of the VLDB Endowment (PVLDB), 1(1), 1095-1106, 2008. • [GeoInformatica’10] Cui Yu, Rui Zhang, Yaochun Huang, HuiXiong: High-dimensional kNN joins with incremental updates. GeoInformatica, 1 (14), 55-82, 2010. • [VLDB’10] Rui Zhang, Martin Stradling. The HV-tree: a Memory Hierarchy Aware Version Index. Proceedings of the VLDB Endowment (PVLDB), 3(1), 397-408, 2010. • [VLDBJ’10a] SaranaNutanong, Rui Zhang, EgemenTanin, Lars Kulik. Analysis and Evaluation of V*-kNN: An Efficient Algorithm for Moving kNN Queries. VLDB Journal, 19(3): 307-332, 2010. • [VLDBJ’10b] Mohammed Eunus Ali, EgemenTanin, Rui Zhang, Lars Kulik. A Motion-Aware Approach for Efficient Evaluation of Continuous Queries on 3D Object Databases. VLDB Journal, 19(5): 603-632, 2010. • [TKDE’10] SaranaNutanong, EgemenTanin, Rui Zhang. Incremental Evaluation of Visible Nearest Neighbor Queries. IEEE Transactions on Knowledge & Data Engineering (TKDE), 22(5): 665-681, 2010. • [InfSys’10] Rui Zhang, H. V. Jagadish, Bing Tian Dai, KotagiriRamamohanarao. Optimized Algorithms for Predictive Range and KNN Queries on Moving Objects. Information Systems, 35(8): 911-932, 2010. • [VLDBJ’12] Rui Zhang, JianzhongQi, Dan Lin, Wei Wang, Raymond Chi-Wing Wong. A Highly Optimized Algorithm for Continuous Intersection Join Queries over Moving Objects. VLDB Journal, 21(4): 561-586, 2012. • [VLDBJ’13] TanzimaHashem, Lars Kulik, Rui Zhang. Countering Overlapping Rectangle Privacy Attack for Moving kNN Queries. Information Systems. 38(3): 430-453, 2013.