1 / 23

Making the Pyramid Technique Robust to Query Types and Workloads

Making the Pyramid Technique Robust to Query Types and Workloads. Rui Zhang, Beng Chin Ooi, Kian-Lee Tan Department of Computer Science National University of Singapore Singapore. Outline. Backgrounds Existing work and limitations Our proposal: The P + -tree Experimental results

tangia
Download Presentation

Making the Pyramid Technique Robust to Query Types and Workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making the Pyramid Technique Robust to Query Types and Workloads Rui Zhang, Beng Chin Ooi, Kian-Lee Tan Department of Computer Science National University of Singapore Singapore

  2. Outline • Backgrounds • Existing work and limitations • Our proposal: The P+-tree • Experimental results • Conclusion

  3. Problem & Motivation Problem: Indexing multidimensional point data Applications: • Low dimension: GIS, CAD, Medical image (X-rays, MRI brain scans) • High dimension: Image database, Video database, data warehouse

  4. Typical Query Types • Point Query • Window Query [q0min; q0max]; [q1min; q1max]…[qd-1min; qd-1max] • Range Query X(x1 , x2 , … xd-1), r • K-Nearest Neighbor Query (kNN query) X(x1 , x2 , … xd-1), k

  5. Existing work: Four Strategies • Data partitioning: R-tree family • Space partitioning: k-d-tree family • Dimensionality Reduction: mapping • Data Compression: VA-file, IQ-tree

  6. Existing work: Comparison • Low-dimensional space • The R-tree family structures • For high-dimensional space • Window query: the Pyramid tech. , the iMinMax • kNN query: the IQ-tree, the iDistance

  7. Existing work: Limitations • Limited to query types • The Pyramid tech. , the iMinMax: window query • The iDistance, the IQ-tree: kNN query • Limited to certain workloads • The Pyramid tech. : hyper-cube shaped window query, located around center of the data space

  8. Our proposal: the P+-tree • Based on the Pyramid tech. • Support both window and kNN queries • Robust under different workloads

  9. Review of the Pyramid Tech. i: pyramid number hv: height , in the i’th (if i<d) or (i-d)’th (if i>=d) dimension pvv=i+hv

  10. Sensitivity to location of query window / data distribution

  11. Sensitivity to shape of query

  12. The P+-tree • Divide data space to subspaces • Based on clustering • Divide in the dimension where two clusters differ greatest • Transform the points in each subspace • Transform a subspace to unit hyper-cube, [si min, simax]d ->[0, 1]d, so that the pyramid tech can be applied • Move the cluster center to center of the transformed space (0.5, 0.5, … 0.5), the case when the pyramid tech is efficient

  13. Space division and data transformation

  14. Transformation function • A set of d functions, t0 t1… td-1 • Requirements: • ti is a bijection from [si min , si max] to [0,1] • ti is monotonous • ti ( ci ) = 0.5 • In equations: • ti (si min ) = 0 • ti (si max ) = 1 • ti ( ci ) = 0.5

  15. Transformation function • ti(x)=(ai x – bi)^ei i=0, 1, … d-1 • For subspace [s0 min , s0 max], [s0 min , s0 max], … [sd-1 min , sd-1 max] ai=1/(si min - si max) bi= si min /(si min - si max) ei=-1/log2(ai ci - bi)

  16. The space-tree SNo, ai, bi, ei are stored in leaf nodes

  17. Space division algorithm • Clustering data • Divide space to two subspaces in the dimension where the two cluster centers differ greatest (Recursively) • Build the space-tree

  18. Build the P+-tree • The P+-tree is in effect a B+-tree that store the data points in the leaf nodes with the P+-value as keys • P+-value: SNo · 2d + pv(v’) • For a newly inserted point v, traverse the space-tree to determine the subspace it belongs to. • Transform the point v to v’, calculate P+-value • Insert the point v, with its P+-value as key

  19. Window search algorithm • Traverse the space-tree to see which subspaces are intersected by the query • For each intersected subspace, transform the query according to the transformation function for the subspace • Search the subspace according to the transformed query

  20. KNN search algorithm • Start from a small window query • Gradually increase the side length of the query window until kNN are found

  21. Experiments: Window Queries

  22. Experiments: Partial Window Queries

  23. Experiments: kNN Queries

More Related