1 / 13

Indexing Multidimensional Data : A Mapping Based Approach

Indexing Multidimensional Data : A Mapping Based Approach. Rui Zhang. Outline. Backgrounds Multidimensional data and queries Mapping based multidimensional indexing and query processing General strategy Window queries K nearest neighbor ( KNN ) queries Summary and future work.

Download Presentation

Indexing Multidimensional Data : A Mapping Based Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing Multidimensional Data :A Mapping Based Approach Rui Zhang

  2. Outline • Backgrounds • Multidimensional data and queries • Mapping based multidimensional indexing and query processing • General strategy • Window queries • K nearest neighbor (KNN) queries • Summary and future work

  3. Multidimensional Data (low-dimensionality) • Spatial data • Geographic Information: Melbourne (37, 145) • Which city is at (30, 140)? • Computer Aided Design: width and height (40, 50) • Any part that has a width of 40 and height of 50? • Records with multiple attributes • Employee (ID, age, score, salary, …) • Is there any employee whose age is under 25 and performance score is greater than 80 andsalary is between 3000 and 5000 • Multimedia data • Color histograms of images • Give me the most similar image to • Multimedia Features: color, shape, texture (medium-dimensionality) (high-dimensionality)

  4. Multidimensional Queries • Point query • Return the objects located at Q(x1, x2, …, xd). • E.g. Q=(3.4, 6.6). • Window query • Return all the objects enclosed or intersected by the hyper-rectangle W{[L1, U1], [L2, U2], …, [Ld, Ud]}. • E.g. W={[0,4],[2,5]} • K-Nearest Neighbor Query (KNN Query) • Return k objects whose distances to Q are no larger than any other object’ distance to Q. • E.g. 3NN of Q=(4,1)

  5. Mapping Based Multidimensional Indexing Sort • Story • The CBD:[0,4][2,5] • Blocks in the CBD are:[8,15], [32,33]and[36,37] • General strategy: three steps • Data mapping and indexing • Query mapping and data retrieval • Filtering out false positive

  6. Another mapping example Sort • Story continued • The CBD:[0,4][2,5] • Streets intersected by the CBD are:[11,14], [21,22]and41 • The Pyramid-tree [SIGMOD’98] • Data space divided into 2d pyramids • Streets are parallel to the base of the pyramid • Data mapping • Objects mapped to the street numbers • Query mapping • Query window mapped to all the intersected streets

  7. Deficiency of the Pyramid-tree • Sensitivity to location of query window Magic of mapping • A set of d functions: t1 , t2 , …, td ; ti satisfies that: • bijection from [0,1] to [0,1] • monotonic • ti (ci) = 0.5 • Apply ti to the query, so that: • The answers of the transformed queries over the transformed data are the answers of the original query over the original data. Ci=0.25 Ci=0.707

  8. The P+-tree [ICDE’04] • Two measures • Space division • Mapping the data • Performance

  9. 2 1 3 Mapping for KNN Queries Sort 24 23 22 21 • Story continued • New factory atQ[4,1] • Find 3 nearest buildings to Q • Termination condition • K candidates • All in the current search circle 14 4 13 3 12 2 32 11 1 31 Q R = 1.75 R = 0.35 R = 0.70 R = 1.05 R = 1.40 R = 2.10 ||CQ|| = 1.84 ||DQ|| = 2.05 ||BQ|| = 1.81 ||FQ|| = 3.62 ||AQ|| = 3.31 ||EQ|| = 3.00

  10. The iDistance[TODS’05a] • Data partitioned into a number of clusters • Streets are concentric circles • Data mapping • Objects mapped to street numbers • Query mapping • Search circle mapped to streets intersected • Performance

  11. Summary • Summary • P+-tree for Window Queries[ICDE’04] • iDistance for kNN Queries[TODS’05a] • A function for mapping data and queries. Efficiency lie in the design of the mapping function • Generalized Multidimensional Data Mapping and Query Processing[TODS’05b]

  12. Recent work and Trend • Queries on moving objects, continuous queries • Predictive range and knn queries[InfSys’10] • Continuous retrieval of 3D objects[ICDE’08b, VLDBJ’10b] • Continuous intersection join [ICDE’08a, VLDBJ’12] • Continuous knn join [GeoInformatica’10] • (Continuous) Moving knn queries [VLDB’08b, VLDBJ’10a, InfSys’13] • Other types of incremental queries [TKDE’10] • Temporal queries • Version index with compression [VLDB’08a] • Memory hierarchy friendly index, HV-tree [VLDB’10]

  13. References • [SIGMOD’98] Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. ACM SIGMOD International Conference on Management of Data (SIGMOD) 1998. • [ICDE’04] Rui Zhang, Beng Chin Ooi, Kian-Lee Tan. Making the Pyramid Technique Robust to Query Types and Workloads. IEEE International Conference on Data Engineering (ICDE) 2004. • [TODS’05a] H.V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search. ACM Transactions on Data Base Systems (TODS), 30(2), 2005. • [TODS’05b] Rui Zhang, PanosKalnis, Beng Chin Ooi, Kian-Lee Tan. Generalized Multi-dimensional Data Mapping and Query Processing. ACM Transactions on Data Base Systems (TODS), 30(3), 2005. • [VLDB’00] Frank Ramsak, Volker Markl, Robert Fenk, Martin Zirkel, Klaus Elhardt, Rudolf Bayer. Integrating the UB-Tree into a Database System Kernel. International Conference on Very Large Data Bases (VLDB) 2000. • [PODS’00] Beng Chin Ooi, Kian-Lee Tan, Cui Yu, StéphaneBressan. Indexing the Edges - A Simple and Yet Efficient Approach to High-Dimensional Indexing. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) 2000. • [ICDE’08a] Rui Zhang, Dan Lin, KotagiriRamamohanarao, Elisa Bertino. Continuous Intersection Joins over Moving Objects. Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 863-872, April 7-12, 2008. • [ICDE’08b] Mohammed Eunus Ali, Rui Zhang, EgemenTanin, Lars Kulik. A Motion-Aware Approach to Continuous Retrieval of 3D Objects. Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 843-852, April 7-12, 2008. • [VLDB’08a] David Lomet, Mingsheng Hong, RimmaNehme, Rui Zhang: Transaction Time Indexing with Version Compression. Proceedings of the VLDB Endowment (PVLDB), 1(1), 870-881, 2008. • [VLDB’08b] SaranaNutanong, Rui Zhang, EgemenTanin, Lars Kulik: The V*-Diagram: A Query Dependent Approach to Moving KNN Queries. Proceedings of the VLDB Endowment (PVLDB), 1(1), 1095-1106, 2008. • [GeoInformatica’10] Cui Yu, Rui Zhang, Yaochun Huang, HuiXiong: High-dimensional kNN joins with incremental updates. GeoInformatica, 1 (14), 55-82, 2010. • [VLDB’10] Rui Zhang, Martin Stradling. The HV-tree: a Memory Hierarchy Aware Version Index. Proceedings of the VLDB Endowment (PVLDB), 3(1), 397-408, 2010. • [VLDBJ’10a] SaranaNutanong, Rui Zhang, EgemenTanin, Lars Kulik. Analysis and Evaluation of V*-kNN: An Efficient Algorithm for Moving kNN Queries. VLDB Journal, 19(3): 307-332, 2010. • [VLDBJ’10b] Mohammed Eunus Ali, EgemenTanin, Rui Zhang, Lars Kulik. A Motion-Aware Approach for Efficient Evaluation of Continuous Queries on 3D Object Databases. VLDB Journal, 19(5): 603-632, 2010. • [TKDE’10] SaranaNutanong, EgemenTanin, Rui Zhang. Incremental Evaluation of Visible Nearest Neighbor Queries. IEEE Transactions on Knowledge & Data Engineering (TKDE), 22(5): 665-681, 2010. • [InfSys’10] Rui Zhang, H. V. Jagadish, Bing Tian Dai, KotagiriRamamohanarao. Optimized Algorithms for Predictive Range and KNN Queries on Moving Objects. Information Systems, 35(8): 911-932, 2010. • [VLDBJ’12] Rui Zhang, JianzhongQi, Dan Lin, Wei Wang, Raymond Chi-Wing Wong. A Highly Optimized Algorithm for Continuous Intersection Join Queries over Moving Objects. VLDB Journal, 21(4): 561-586, 2012. • [VLDBJ’13] TanzimaHashem, Lars Kulik, Rui Zhang. Countering Overlapping Rectangle Privacy Attack for Moving kNN Queries. Information Systems. 38(3): 430-453, 2013.

More Related