1 / 19

k-Nearest Neighbors in Uncertain Graphs

k-Nearest Neighbors in Uncertain Graphs. VLDB10. Lin Yincheng 2011-02-28. Outline . Background Motivation Problem Definition Query Answering Approach Experimental Results. Background. k-Nearest Neighbors. Uncertain Graphs. 15. 5. 15. 5. 5.

ronna
Download Presentation

k-Nearest Neighbors in Uncertain Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. k-Nearest Neighbors in Uncertain Graphs VLDB10 Lin Yincheng 2011-02-28

  2. Outline • Background • Motivation • Problem Definition • Query Answering Approach • Experimental Results

  3. Background k-Nearest Neighbors Uncertain Graphs 15 5 15 5 5 Find out 2-nearest neighbors for vertex B

  4. Motivation • Define meaningful distance functions which is more useful to identify true neighbors • Introduce a novel pruning algorithm to process knn queries in uncertain graphs. 15(0.2) 15(0.6) 5(0.4) 5(0.3) 5(0.7) most-probable-path-distance

  5. Problem Definition • Assumption: Independence among edges • Probabilistic Graph ModelG(V, E, P, W) • V and E denote the set of nodes and edges respectively; • P denotes the probabilities associated with each edge; • W assigns each edge with a weight • k-NN Query

  6. Distances • Median-Distance(s, t) • Majority-Distance(s, t) • Expected-Reliable-Distance(s, t)

  7. Challenges • For computation of median-distance and majority-distance, we need to obtain their distributions over all possible worlds. • For computation of expected-reliable-distance, it has been proved as a #P hard problem.

  8. Sampling

  9. Sample Size for Median-D

  10. Sample Size for E-R-D

  11. Qualitative Analysis • Classification Experiment • Testing data: two classes, one is a triplet set of the form <A, B0, B1> and the other is a triplet set of the form<A, B1, B0> • A classifier: it tries to identify the true neighbors. • Measure: <False positive rate, True positive rate> • Data sets: Protein-protein interaction network DBLP Co-authorship network

  12. Results

  13. ObservationMedian-D • Considering a new probability distribution • The below lemma could be achieved D is a distance value

  14. Core Pruning Scheme • Query Transformation d D, M(s, t1) < d D, M(s, t2) => d M(s, t1) < d M(s, t2) d M(s, t1) >= d M(s, t2) => d D, M(s, t1) >= d D, M(s, t2)

  15. Median-D kNN Query Answering

  16. Majority-D kNN Query Answering • The condition of d which is the exact majority distance should be Pr(d) >= 1 – P, P denotes the sum of visited nodes’ probabilities. • For the node which enters the kNN-set could be possibly replaced by another node with smaller majority distance at a later step.

  17. Experimental Results • Dataset overview • Convergence of D-F Using the distance of a sample of 500 pws as the ground truth

  18. Efficiency of k-NN Pruning The fraction of visited nodes (pruning efficiency) as a function of k Pruning efficiency as a function of sample size

  19. Quality of Results Median-D Pruning efficiency as a function of edge probability Stability as a function of the number of possible worlds

More Related