1 / 14

Indexing Multidimensional Data

Indexing Multidimensional Data. Rui Zhang http://www.csse.unimelb.edu.au/~rui The University of Melbourne Aug 2006. Outline. Backgrounds Multidimensional data and queries Approaches Mapping based indexing Z-curve iDistance Hierarchical-tree based indexing R-tree k-d-tree Quad-tree

Download Presentation

Indexing Multidimensional Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing Multidimensional Data Rui Zhang http://www.csse.unimelb.edu.au/~rui The University of Melbourne Aug 2006

  2. Outline • Backgrounds • Multidimensional data and queries • Approaches • Mapping based indexing • Z-curve • iDistance • Hierarchical-tree based indexing • R-tree • k-d-tree • Quad-tree • Compression based indexing • VA-file

  3. Multidimensional Data (low-dimensionality) • Spatial data • Geographic Information: Melbourne (37, 145) • Which city is at (30, 140)? • Computer Aided Design: width and height (40, 50) • Any part that has a width of 40 and height of 50? • Records with multiple attributes • Employee (ID, age, score, salary, …) • Is there any employee whose age is under 25 and performance score is greater than 80 andsalary is between 3000 and 5000 • Multimedia data • Color histograms of images • Give me the most similar image to • Multimedia Features: color, shape, texture (medium-dimensionality) (high-dimensionality)

  4. Multidimensional Queries • Point query • Return the objects located at Q(x1, x2, …, xd). • E.g. Q=(3.4, 6.6). • Window query • Return all the objects enclosed or intersected by the hyper-rectangle W{[L1, U1], [L2, U2], …, [Ld, Ud]}. • E.g. W={[0,4],[2,5]} • K-Nearest Neighbor Query (KNN Query) • Return k objects whose distances to Q are no larger than any other object’ distance to Q. • E.g. 3NN of Q=(4,1)

  5. Mapping Based Multidimensional Indexing Sort • Story • The CBD:[0,4][2,5] • Blocks in the CBD are:[8,15], [32,33]and[36,37] • General strategy: three steps • Data mapping and indexing • Query mapping and data retrieval • Filtering out false positive

  6. The Z-curve and Other Space-Filling Curves • The Z-curve • Z-value calculation: bit-interleaving • Support efficient window queries • Disadvantage • Jumps • Other space-filling curves • Hilbert-curves • Gray-code • Column-wise scan

  7. 2 1 3 Mapping for KNN Queries Sort 24 23 22 21 • Story continued • New factory atQ[4,1] • Find 3 nearest buildings to Q • Termination condition • K candidates • All in the current search circle 14 4 13 3 12 2 32 11 1 31 Q R = 1.75 R = 0.35 R = 0.70 R = 1.05 R = 1.40 R = 2.10 ||CQ|| = 1.84 ||DQ|| = 2.05 ||BQ|| = 1.81 ||FQ|| = 3.62 ||AQ|| = 3.31 ||EQ|| = 3.00

  8. The iDistance • Data partitioned into a number of clusters • Streets are concentric circles • Data mapping • Objects mapped to street numbers • Query mapping • Search circle mapped to streets intersected

  9. Hierarchical Tree Structures • K-d-tree • Space division recursively • Complete and disjoint partitioning • In-memory; Unbalanced • There are algorithms to pageand balance the tree, but withmore complex manipulations • R-tree • Minimum bounding rectangle (MBR) • Incomplete and overlapping partitioning • Disk-based; Balanced N3 N1 N3 N3 N1 N3 N4 N1 N1 N1 A N1 B N2 C D A A B 0.5 C D A D D N1 N5 N2 N1 N2 G F F F C A D B 0.3 C E N5 F G A C D B E E N2 E C B G B N1 N2 N2 N4 B C E F G A A D D Problem: Overlap Problem: Empty space G F F C E E C B G B

  10. Hierarchical Tree Structures (continued) • Quad-tree • Space divided into 4 rectanglesrecursively. • Complete and disjoint partitioning • In-memory; Unbalanced • There are algorithms to pageand balance the tree, but withmore complex manipulations • The point quad-tree A NW NE NW NE SW SE D A F D C B C B E G SE G E F SW

  11. Compression Based Indexing • The dimensionality curse • The Vector Approximation File (VA-File) VA File Skewed data

  12. Summary of the Indexing Techniques

  13. Index Implementations in major DBMS • SQL Server • B+-Tree data structure • Clustered indexes are sparse • Indexes maintained as updates/insertions/deletes are performed • Oracle • B+-tree, hash, bitmap, spatial extender for R-Tree • Clustered index • Index organized table (unique/clustered) • Clusters used when creating tables • DB2 • B+-Tree data structure, spatial extender for R-tree • Clustered indexes are dense • Explicit command for index reorganization

  14. Recommended Readings and References • Survey on multidimensional indexing techniques • Christian Böhm, Stefan Berchtold, Daniel A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases.ACM Computing Surveys 2001. • Volker Gaede, Oliver Günther. Multidimensional Access Methods.ACM Computing Surveys 1998 • Mapping based indexing • Rui Zhang, Panos Kalnis, Beng Chin Ooi, Kian-Lee Tan. Generalized Multi-dimensional Data Mapping and Query Processing.ACM Transactions on Data Base Systems (TODS), 30(3), 2005. • Space-filling curves • H. V. Jagadish. Linear Clustering of Objects with Multiple Atributes.ACM SIGMOD Conference (SIGMOD) 1990. • iDistance • H.V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search.ACM Transactions on Data Base Systems (TODS), 30(2), 2005. • R-tree • Antonin Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. ACM SIGMOD Conference (SIGMOD) 1984. • Quad-tree • Hanan Samet. The Quadtree and Related Hierarchical Data Structures.ACM Computing Surveys 1984. • VA-File • Roger Weber, Hans-Jörg Schek, Stephen Blott. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces.International Conference on Very Large Data Bases (VLDB)1998.

More Related