370 likes | 518 Views
EFFICIENT RANK BASED K-NN QUERY PROCESSING OVER UNCERTAIN DATA. Presented by: Duong, Huu Kinh Luan March 14 th , 2011. Outline. Authors of paper What is the problem? Why there is the problem ?. Introduction Background information Problem Definition Handling Technique Algorithm
E N D
EFFICIENT RANK BASED K-NN QUERY PROCESSING OVER UNCERTAIN DATA Presented by: Duong, Huu Kinh Luan March 14th, 2011
Outline • Authors of paper • What is the problem? • Why there is the problem? • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results • Rank Based K-NN • Related Papers on the same topic • Top-k Properties • Problem Definition • Notations used • Exact Algorithm • Randomized Algorithm
Introduction • Authors of the paper Xuemin Lin - Professor The University of New South Wales PhD – C.S from the U. Queensland (Australia) in 1992 Ying Zhang Research Fellow PhD – 01.2008 Wenjie Zhang Post-doc research fellow PhD – 2010 Gaoping Zhu PhD Candidate Qianlu Lin PhD Candidate
Introduction • What is the problem? Uncertain Data SENSOR NETWORK GPS TRACKING DEVICE
Introduction • What is the problem? k-Nearest Neighbor query
Background information Rank based k-NN
Background information • G. Cormode, F. Li, and K. Yi “Semantic of ranking queries for probabilistic data and expected ranks” • R. Chen, L. Chen, J. Chen and X. Xie“Evaluating probability threshold k-nn queires over uncertain data” • V. Ljosa and A. K. Singh“Apla: Indexing arbitrary probability distributions” Rank based k-NN is not a new problem
Outline • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results
Problem definition Set of objects: U = {U1, U2, …, Un} U = {U1, U2, U3, U4} Possible World: W = {u1, u2, u3, …, un} W1 = {U1, U2} U U2 U1 Definition 1: Rank (Rank of an obj U in one possible world W) q U3 U4
Problem definition Definition 2: Expected Rank Definition 3: Median Rank
Problem definition Example: Show on board Possible Worlds? Rank for A? i.e. r(a1), r(a2), r(a3) Expected rank for A? i.e. er(A) Median rank for A? i.e. mr(A)
Problem definition • Top–K Query: Find k nearest neighbors for a given query q based on the expected (median) ranks of n objects.
Problem definition • Top–K Properties: Exact-k: K-NN query answer should return exactly k objects Containment: (K+1)-NN should contain all objects in KNN Unique Ranking: The same object should not be listed multiple times in KNN Value invariance: The distance only determines the relative behavior of the object Stability: Making an item in the top-k list more likely or more important should not remove it from the list
Problem definition • Top–K Properties: Proof that expected rank satisfies all 5 top-k properties not this paper major concern. It is done in the paper “Semantic of ranking queries for probabilistic data and expected ranks”, by G. Cormode, F. Li, and K. Yi
Problem challenge • Overcome previous paper’s difficulties: Reduce the number of objects accessed Pre-computed expected scores of objects Expected score might change upon different queries Approximation of KNN querie answer
Outline • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results
Handling technique Lemma 2: Let ui and uj be the instances which determine the median rank and median distance of U respectively, we have r(ui) = r(uj) !
Handling technique • Finding Minimal Set for Selection Problem(Using Bound Based Approach) Motivation for the Algorithm
Outline • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results
Algorithm Uncertain objects R-Tree query q also represented in R-Tree e(I) from d-(I) to d+(I)
Algorithm • Example of calculating r-(I) and r+(I) smaller than Sum up for r-(I)
Algorithm • Example of calculating r-(I) and r+(I) smaller than Sum up for r+(I)
Algorithm • Exact Algorithm: accrmin: accumulation of the probability values of the invervals {I of I} with d+(I)<=d Uarmin(d): accumulation of the probability values of the invervals {I of IU} with d+(I)<=d
Algorithm Cost: • Exact Algorithm: Initial Procedure: O(nlogn + np0 x cio) One round: O(n x m log(n x m)) Total time cost: T = O(h x n x m log(n x m)) + npi x cio (i:0:h) n: number of objects m: number of interval in 1 object h: max height of local R-Tree npo: number of IO npi: number of IO in ith round cio : cost of each IO
Algorithm Sample the possible world such that the expected rank and median rank can be approximately computed in an efficient way. • Randomized Algorithm:
Algorithm Estimate the expected rank of an object U where ri(U) is the rank of U in sample Si • Randomized Algorithm: Recall:
Algorithm • Find candidate objects C for the KNN query based on the global R-Tree • Minimal/Maximal Expected rank for each object using Sweepline algorithm • l and r --> value to prune or validate objects for the KNN query • Randomized Algorithm:
Algorithm T = O(nlogn + n’logn + n1 x cio) • Randomized Algorithm – Cost: O(nlogn) O(logn) O(n’logn + n1 x cio)
Algorithm • What is n’?
Outline • Introduction • Background information • Problem Definition • Handling Technique • Algorithm • Experimental Results
Experimental Results • Comparision with the other paper: This paper The other paper