180 likes | 350 Views
Joint work with Pankaj K. Agarwal , Alon Efrat , and Swaminathan Sankararaman. Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Department of Computer Science, Duke University. PODS, May 23, 2012. Nearest-Neighbor Searching. a set of points in . any query point in .
E N D
Joint work with Pankaj K. Agarwal, AlonEfrat, and SwaminathanSankararaman Nearest-Neighbor Searching Under UncertaintyWuzhou ZhangDepartment of Computer Science, Duke University PODS, May 23, 2012
Nearest-Neighbor Searching a set of points in any query point in Find the closest point to
Voronoi Diagram • Voronoi cell: • Voronoidiagram: decomposition induced by
Data Uncertainty • Location of data is imprecise: Sensor databases, face recognition, mobile data, etc. What is the “nearest neighbor” of now?
Our Model and Problem Statement • Uncertain point : represented as a probability density function(pdf) -- • Expected distance: • . Find the expected nearest neighbor (ENN) of : Or an -ENN:
Previous work • Uncertain data • ENN • The ENN under metric: ε-approximation [Ljosa2007] • No bounds on the running time • Most likely NN • Heuristics [Cheng2008, Kriegel2007, Cheng2004, etc] • Uncertain query • ENN • Discrete uniform distribution: both exact and O(1) factor approximation [Li2011, Sharifzadeh2010, etc] • No bounds on the running time
Our contribution Firstnontrivial methods for ENN queries with provable performance guarantees ! Results in , extends to higher dimensions
Expected Voronoi Diagram • Expected Voronoi cell • Expected Voronoi diagram : induced by • An example in metric
Squared Euclidean distanceUncertain data • : the centroid of • Lemma: • same as the weighted Voronoi diagram WVD Remarks: Works for any distribution
metricUncertain data • Size of : • Lower bound construction the inverse Ackermann function Remarks: Extends to metric
metricUncertain data (cont.) • A near-linear size index exists despite size of Remarks: Extends to higher dimensions
Euclidean metric (-ENN)Uncertain data • Approximateby • Outside the grid: • Inside the gird: • Total # of cells: Cell size: 𝜀 Remarks: Extends to any metric
Euclidean metric (-ENN)Uncertain data (cont.) • A linear size approximate ! 13
Conclusion and future work • Conclusion • Firstnontrivial methods for answering exact or approximate ENN queries with provable performance guarantees • ENN is not a good indicator when the variance is large • Future work • Linear-size index for most likely NN queries in sublineartime • Index for returning the probability distribution of NNs Thanks
Squared Euclidean distanceUncertain query • : the centroid of • Preprocessing • Compute the Voronoi diagram VD • Query • Given , compute in , then query VD with Remarks: Extends to higher dimensions and works for any distribution
Rectilinear metricUncertain query • Similarly, linear pieces
Euclidean metric (-ENN)Uncertain query Remarks: Extends to higher dimensions
metricUncertain data (cont.) • A near-linear size index exists despite size of • linear pieces! Linear!