1 / 37

EFFICIENT RANK BASED K-NN QUERY PROCESSING OVER UNCERTAIN DATA

EFFICIENT RANK BASED K-NN QUERY PROCESSING OVER UNCERTAIN DATA. Presented by: Duong, Huu Kinh Luan March 14 th , 2011. Outline. Authors of paper What is the problem? Why there is the problem ?. Introduction Background information Problem Definition Handling Technique Algorithm

tiara
Download Presentation

EFFICIENT RANK BASED K-NN QUERY PROCESSING OVER UNCERTAIN DATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EFFICIENT RANK BASED K-NN QUERY PROCESSING OVER UNCERTAIN DATA Presented by: Duong, Huu Kinh Luan March 14th, 2011

  2. Outline • Authors of paper • What is the problem? • Why there is the problem? • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results • Rank Based K-NN • Related Papers on the same topic • Top-k Properties • Problem Definition • Notations used • Exact Algorithm • Randomized Algorithm

  3. Introduction • Authors of the paper Xuemin Lin - Professor  The University of New South Wales PhD – C.S from the U. Queensland (Australia) in 1992 Ying Zhang Research Fellow PhD – 01.2008 Wenjie Zhang Post-doc research fellow PhD – 2010 Gaoping Zhu PhD Candidate Qianlu Lin PhD Candidate

  4. Introduction • What is the problem? Uncertain Data SENSOR NETWORK GPS TRACKING DEVICE

  5. Introduction • What is the problem? k-Nearest Neighbor query

  6. Background information Rank based k-NN

  7. Background information • G. Cormode, F. Li, and K. Yi “Semantic of ranking queries for probabilistic data and expected ranks” • R. Chen, L. Chen, J. Chen and X. Xie“Evaluating probability threshold k-nn queires over uncertain data” • V. Ljosa and A. K. Singh“Apla: Indexing arbitrary probability distributions” Rank based k-NN is not a new problem

  8. Outline • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results

  9. Problem definition Set of objects: U = {U1, U2, …, Un}  U = {U1, U2, U3, U4} Possible World: W = {u1, u2, u3, …, un}  W1 = {U1, U2} U U2 U1 Definition 1: Rank (Rank of an obj U in one possible world W) q U3 U4

  10. Problem definition Definition 2: Expected Rank Definition 3: Median Rank

  11. Problem definition Example: Show on board Possible Worlds? Rank for A? i.e. r(a1), r(a2), r(a3) Expected rank for A? i.e. er(A) Median rank for A? i.e. mr(A)

  12. Problem definition • Top–K Query: Find k nearest neighbors for a given query q based on the expected (median) ranks of n objects.

  13. Problem definition • Top–K Properties: Exact-k: K-NN query answer should return exactly k objects Containment: (K+1)-NN should contain all objects in KNN Unique Ranking: The same object should not be listed multiple times in KNN Value invariance: The distance only determines the relative behavior of the object Stability: Making an item in the top-k list more likely or more important should not remove it from the list

  14. Problem definition • Top–K Properties: Proof that expected rank satisfies all 5 top-k properties  not this paper major concern.  It is done in the paper “Semantic of ranking queries for probabilistic data and expected ranks”, by G. Cormode, F. Li, and K. Yi

  15. Problem challenge • Overcome previous paper’s difficulties: Reduce the number of objects accessed Pre-computed expected scores of objects Expected score might change upon different queries Approximation of KNN querie answer

  16. Outline • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results

  17. Handling technique Lemma 2: Let ui and uj be the instances which determine the median rank and median distance of U respectively, we have r(ui) = r(uj) !

  18. Handling technique • Finding Minimal Set for Selection Problem(Using Bound Based Approach) Motivation for the Algorithm

  19. Outline • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results

  20. Notations in the Algo.

  21. Algorithm Uncertain objects R-Tree query q also represented in R-Tree e(I) from d-(I) to d+(I)

  22. Algorithm • Example of calculating r-(I) and r+(I) smaller than Sum up for r-(I)

  23. Algorithm • Example of calculating r-(I) and r+(I) smaller than Sum up for r+(I)

  24. Algorithm • Exact Algorithm: accrmin: accumulation of the probability values of the invervals {I of I} with d+(I)<=d Uarmin(d): accumulation of the probability values of the invervals {I of IU} with d+(I)<=d

  25. Algorithm Cost: • Exact Algorithm: Initial Procedure: O(nlogn + np0 x cio) One round: O(n x m log(n x m)) Total time cost: T = O(h x n x m log(n x m)) + npi x cio (i:0:h) n: number of objects m: number of interval in 1 object h: max height of local R-Tree npo: number of IO npi: number of IO in ith round cio : cost of each IO

  26. Algorithm Sample the possible world such that the expected rank and median rank can be approximately computed in an efficient way. • Randomized Algorithm:

  27. Algorithm Estimate the expected rank of an object U where ri(U) is the rank of U in sample Si • Randomized Algorithm: Recall:

  28. Algorithm • Find candidate objects C for the KNN query based on the global R-Tree • Minimal/Maximal Expected rank for each object using Sweepline algorithm • l and r --> value to prune or validate objects for the KNN query • Randomized Algorithm:

  29. Algorithm T = O(nlogn + n’logn + n1 x cio) • Randomized Algorithm – Cost: O(nlogn) O(logn) O(n’logn + n1 x cio)

  30. Algorithm • What is n’?

  31. Outline • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results

  32. Experimental Results

  33. Experimental Results

  34. Experimental Results

  35. Experimental Results

  36. Experimental Results • Comparision with the other paper: This paper The other paper

  37. Q&A

More Related