1 / 32

Probabilistic Similarity Queries in Uncertain Databases

Dagstuhl Seminar 2008 Uncertainty Management in Information Systems. Probabilistic Similarity Queries in Uncertain Databases. Matthias Renz Ludwig-Maximilians-Universität München Munich, Germany www.dbs.ifi.lmu.de. Outline. Introduction Probabilistic Similarity Queries

pepper
Download Presentation

Probabilistic Similarity Queries in Uncertain Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dagstuhl Seminar 2008 Uncertainty Management in Information Systems Probabilistic Similarity Queries in Uncertain Databases Matthias Renz Ludwig-Maximilians-Universität München Munich, Germany www.dbs.ifi.lmu.de

  2. Outline Introduction Probabilistic Similarity Queries multi-step query processing probabilistic -range/k-NN queries Probabilistic Similarity Ranking probabilistic ranking models efficient computation of probabilistic ranking queries Summary M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 2

  3. spatial, temporal and multimedia Introduction • modern database applications involve data. • often vague and imprecise attributes • sensor data, e.g. traffic monitoring • feature extraction, e.g. person identification  probabilistic databases M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 3

  4. y x Introduction • types of probabilistic databases • relational uncertainty representation • tuples with confidence • e.g. x-relation model (Trio system) • uncertainty in feature spaces • uncertain vectors • representations: • continuous, discrete (point objects) • spatially uncertainty representation • uncertain spatially extended objects M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 4

  5. Introduction y x M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 5 • types of probabilistic databases • relational uncertainty representation • tuples with confidence • e.g. x-relation model (Trio system) • uncertainty in feature spaces • uncertain vectors • representations: • continuous, discrete (point objects) • spatially uncertainty representation • uncertain spatially extended objects

  6. 1 2 3 Introduction • Probabilistic Similarity Queries • given: • database with uncertain vectors • (uncertain) query object Q • queries: Q Q Q -range query ranking query k-NN query M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 6

  7. Introduction • Probabilistic Similarity Queries • given: • two databases DBA and DBB with uncertain vectors • queries: • challenges: • uncertain similarity distances, uncertain query results join query M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 7

  8. Outline Introduction Probabilistic Similarity Queries multi-step query processing probabilistic -range/k-NN queries Probabilistic Similarity Ranking probabilistic ranking models efficient computation of probabilistic ranking queries Summary M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 8

  9. y y x x Modelling Uncertainty in Feature Spaces • Uncertain Vector Data • vector data in d-dimensional space d • objects are represented by • multiple d-dimensional vectors • that are mutually exclusive • a confidence value is assigned to each vector • types of uncertain object representations pdf (continuous) vector samples (discrete) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 9

  10. Probabilistic Similarity Queries • Example: Probabilistic -Range Query • query object and set of uncertain objects (discrete) qi = {q1,…,qM} and oi={oi,1,…,oi,N} • distance between q and oi: • probability that the distance between q and oi is less than 0+: M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 10

  11. Probabilistic Similarity Queries • Clustered Object Representation object o = {o1,..,os} simple object approximation MBR(o) clustered object approximation MBR(C1(o)), .., MBR(Ck(o)) build approximations by grouping vector points of an object into clusters M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 11

  12. Probabilistic Similarity Queries • advantages of clustered object approximation • efficiently managed by spatial access methods • e.g. R-tree, X-tree • supports multi-step query processing • true hits can be reported very early • reduced refinement cost • efficient computation of approximate answers • PTSQ and PTopkSQ efficiently supported M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 12

  13. Probabilistic Similarity Queries • multi-step query processing: • probabilistic filter • Estimation of probability p = P(d(o,q) ≤ ): uncertain object o (clustered object representation) 0.1 0.2 0.2  0.3 0.1 0.3 ≤ P(d(o,q) ≤ ) ≤ 0.6 0.1 lower bounding prob. estimation upper bounding prob. estimation query point q M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 13

  14. 0.1 0.2 0.2  0.3 0.1 0.1 query point q Probabilistic Similarity Queries • Filter Step for PSQs: • Probabilistic -Range Queries (PTSQ type): • for each uncertain object o: • compute lower and upper bounding probabilities based on cluster representations • if lower bounding probability Plow > , then report o • if upper bounding probability Pupper < , then prune o • otherwise refine o (partial refinement) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 14

  15. Probabilistic Similarity Queries • Filter Step for PSQs: • Probabilistic k-NN-Queries (PTSQ type) • upper bounding probability that p is NN is Pupper=0.7 Example: 0.1 object o 0.2 0.2 0.3 0.1 0.1 query point q object p M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 15

  16. Outline Introduction Probabilistic Similarity Queries multi-step query processing probabilistic -range/k-NN queries Probabilistic Similarity Ranking probabilistic ranking models efficient computation of probabilistic ranking queries Summary M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 16

  17. Probabilistic Similarity Ranking • Ranking Queries • very important for similarity search applications • give the most relevant answers first • are more flexible than -range and NN queries • probabilistic ranking queries • results are associated with confidence values • in contrast to -range / NN queries • no unique query predicate M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 17

  18. Probabilistic Similarity Ranking • output of probabilistic ranking: • for each object:discrete pdf over ranking positions prob_rankedq: D{1,..,N}→[0..1] • prob_rankedq(o,k) reports the probability that object o is exactly the kth-nearest-neighbor of the query object q probability k 9 10 1 2 3 4 5 6 7 8 M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 18

  19. R T S Q O N M P L K Objects Probability Table J I Probability H F G D C E B A ranking coefficient k Probabilistic Similarity Ranking P O • Example: Probabilistic Ranking Output Q N M F L G E J H I A q B C D S K R T vector space probabilistic ranking output M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 19

  20. Probabilistic Similarity Ranking • probabilistic ranking output is inconvenient for most users • coping with probabilistic ranking: • ranking with unique order and confidences Rank OID Conf. 1. A 0.6 2. B 0.7 3. E 0.2 4. C 0.3 … … … M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 20

  21. Probabilistic Similarity Ranking Rank OID Conf. 1. A 0.6 How should we extract the conf. from the prob. ranking output? Which ranking order? 2. B 0.7 3. E 0.2 4. C 0.3 … … … • probabilistic ranking output is inconvenient for most users • coping with probabilistic ranking: • ranking with unique order and confidences • aggregate conf. values to deterministic results M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 21

  22. Probabilistic Similarity Ranking Result: 1. (A,0.45) | 2. (C,0.40) | 3. (C,0.45) or with duplicate elimination 1. (A,0.45) | 2. (C,0.40) | 3. (B,0.35) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 22 • Approaches for Ranking Objects: • Approach 1: highest confidence [Soliman ICDE’07, Yi ICDE’08] • problem: • duplicates • neglected objects • Example:

  23. Probabilistic Similarity Ranking Result: 1. (A,0.45) | 2. (B,0.35) | 3. (C,0.45) or 1. (A,0.45) | 2. (B,0.75) | 3. (C,1.00) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 23 • Approaches for Ranking Objects: • Approach 2: highest aggregated confidence • object with the highest prob. that it is one of the first k objects is assigned to ranking position k. • sensible with duplicate elimination • Example:

  24. Probabilistic Similarity Ranking • further approaches to determine the ranking order, e.g. • expected ranking position • etc. • most intuitive and robust: Approach 2. • problem: • full probabilistic ranking information is required • required: • efficient computation of prob. ranking output M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 24

  25. Probabilistic Similarity Ranking • Iterative Probability Computation • ranking applied on object vectors (samples) • during the radial sweep: maintain for each object o the probability • for each accessed sample oi,j, compute the probability P(oi,j,k) that exactly (k-1) objects o  oiare within the sweep-range , for k = 1..N. radial sweep with increasing range  M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 25

  26. Probabilistic Similarity Ranking • computation of P(oi,j,k): • problem: comp. very expensive • a lot of possibilities for i must be reconsidered • 1) Approach: • pruning objects that are beyond : • reduce DBDB‘ (|DB‘|<<|DB|) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 26

  27. I C H B oi,j q D G A F E Probabilistic Similarity Ranking • applying only relevant objects: A (1.0) B (1.0) F (0.8) D (0.6) H (0.2) C (0.1) E (0.0) G (0.0)  N N‘‘ N‘ M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 27

  28. C C C H H H oi,j oi,j oi,j q q q D D D F F F Probabilistic Similarity Ranking • problem: computation still exponential • 2. Approach • problem can be solved in polynomial time by means of dynamic programming technique:    P(2 of 4 in -range) P(1 of 3 in -range) assuming C in -range P(2 of 3 in -range) assuming C not in -range M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 28

  29. Probabilistic Similarity Ranking problem: computation still exponential 2. Approach: problem can be solved in polynomial time by means of dynamic programming technique: recursive function: M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 29

  30. Outline Introduction Probabilistic Similarity Queries multi-step query processing probabilistic -range/k-NN queries Probabilistic Similarity Ranking probabilistic ranking models efficient computation of probabilistic ranking queries Summary M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 30

  31. Summary • approaches to accelerate probabilistic similarity queries in vector spaces • assumption: • objects are mutually independent • discrete uncertainty representations • support by • traditional access methods • multi-step query processing techniques • very high speed-up factor using Dyn. Prog. M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 31

  32. Discussion Thank you for your attention .. any questions? M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 32

More Related