230 likes | 300 Views
Making the Pyramid Technique Robust to Query Types and Workloads. Rui Zhang, Beng Chin Ooi, Kian-Lee Tan Department of Computer Science National University of Singapore Singapore. Outline. Backgrounds Existing work and limitations Our proposal: The P + -tree Experimental results
E N D
Making the Pyramid Technique Robust to Query Types and Workloads Rui Zhang, Beng Chin Ooi, Kian-Lee Tan Department of Computer Science National University of Singapore Singapore
Outline • Backgrounds • Existing work and limitations • Our proposal: The P+-tree • Experimental results • Conclusion
Problem & Motivation Problem: Indexing multidimensional point data Applications: • Low dimension: GIS, CAD, Medical image (X-rays, MRI brain scans) • High dimension: Image database, Video database, data warehouse
Typical Query Types • Point Query • Window Query [q0min; q0max]; [q1min; q1max]…[qd-1min; qd-1max] • Range Query X(x1 , x2 , … xd-1), r • K-Nearest Neighbor Query (kNN query) X(x1 , x2 , … xd-1), k
Existing work: Four Strategies • Data partitioning: R-tree family • Space partitioning: k-d-tree family • Dimensionality Reduction: mapping • Data Compression: VA-file, IQ-tree
Existing work: Comparison • Low-dimensional space • The R-tree family structures • For high-dimensional space • Window query: the Pyramid tech. , the iMinMax • kNN query: the IQ-tree, the iDistance
Existing work: Limitations • Limited to query types • The Pyramid tech. , the iMinMax: window query • The iDistance, the IQ-tree: kNN query • Limited to certain workloads • The Pyramid tech. : hyper-cube shaped window query, located around center of the data space
Our proposal: the P+-tree • Based on the Pyramid tech. • Support both window and kNN queries • Robust under different workloads
Review of the Pyramid Tech. i: pyramid number hv: height , in the i’th (if i<d) or (i-d)’th (if i>=d) dimension pvv=i+hv
The P+-tree • Divide data space to subspaces • Based on clustering • Divide in the dimension where two clusters differ greatest • Transform the points in each subspace • Transform a subspace to unit hyper-cube, [si min, simax]d ->[0, 1]d, so that the pyramid tech can be applied • Move the cluster center to center of the transformed space (0.5, 0.5, … 0.5), the case when the pyramid tech is efficient
Transformation function • A set of d functions, t0 t1… td-1 • Requirements: • ti is a bijection from [si min , si max] to [0,1] • ti is monotonous • ti ( ci ) = 0.5 • In equations: • ti (si min ) = 0 • ti (si max ) = 1 • ti ( ci ) = 0.5
Transformation function • ti(x)=(ai x – bi)^ei i=0, 1, … d-1 • For subspace [s0 min , s0 max], [s0 min , s0 max], … [sd-1 min , sd-1 max] ai=1/(si min - si max) bi= si min /(si min - si max) ei=-1/log2(ai ci - bi)
The space-tree SNo, ai, bi, ei are stored in leaf nodes
Space division algorithm • Clustering data • Divide space to two subspaces in the dimension where the two cluster centers differ greatest (Recursively) • Build the space-tree
Build the P+-tree • The P+-tree is in effect a B+-tree that store the data points in the leaf nodes with the P+-value as keys • P+-value: SNo · 2d + pv(v’) • For a newly inserted point v, traverse the space-tree to determine the subspace it belongs to. • Transform the point v to v’, calculate P+-value • Insert the point v, with its P+-value as key
Window search algorithm • Traverse the space-tree to see which subspaces are intersected by the query • For each intersected subspace, transform the query according to the transformation function for the subspace • Search the subspace according to the transformed query
KNN search algorithm • Start from a small window query • Gradually increase the side length of the query window until kNN are found