350 likes | 561 Views
Dagstuhl Seminar 2008 Uncertainty Management in Information Systems. Probabilistic Similarity Queries in Uncertain Databases. Matthias Renz Ludwig-Maximilians-Universität München Munich, Germany www.dbs.ifi.lmu.de. Outline. Introduction Probabilistic Similarity Queries
E N D
Dagstuhl Seminar 2008 Uncertainty Management in Information Systems Probabilistic Similarity Queries in Uncertain Databases Matthias Renz Ludwig-Maximilians-Universität München Munich, Germany www.dbs.ifi.lmu.de
Outline Introduction Probabilistic Similarity Queries multi-step query processing probabilistic -range/k-NN queries Probabilistic Similarity Ranking probabilistic ranking models efficient computation of probabilistic ranking queries Summary M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 2
spatial, temporal and multimedia Introduction • modern database applications involve data. • often vague and imprecise attributes • sensor data, e.g. traffic monitoring • feature extraction, e.g. person identification probabilistic databases M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 3
y x Introduction • types of probabilistic databases • relational uncertainty representation • tuples with confidence • e.g. x-relation model (Trio system) • uncertainty in feature spaces • uncertain vectors • representations: • continuous, discrete (point objects) • spatially uncertainty representation • uncertain spatially extended objects M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 4
Introduction y x M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 5 • types of probabilistic databases • relational uncertainty representation • tuples with confidence • e.g. x-relation model (Trio system) • uncertainty in feature spaces • uncertain vectors • representations: • continuous, discrete (point objects) • spatially uncertainty representation • uncertain spatially extended objects
1 2 3 Introduction • Probabilistic Similarity Queries • given: • database with uncertain vectors • (uncertain) query object Q • queries: Q Q Q -range query ranking query k-NN query M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 6
Introduction • Probabilistic Similarity Queries • given: • two databases DBA and DBB with uncertain vectors • queries: • challenges: • uncertain similarity distances, uncertain query results join query M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 7
Outline Introduction Probabilistic Similarity Queries multi-step query processing probabilistic -range/k-NN queries Probabilistic Similarity Ranking probabilistic ranking models efficient computation of probabilistic ranking queries Summary M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 8
y y x x Modelling Uncertainty in Feature Spaces • Uncertain Vector Data • vector data in d-dimensional space d • objects are represented by • multiple d-dimensional vectors • that are mutually exclusive • a confidence value is assigned to each vector • types of uncertain object representations pdf (continuous) vector samples (discrete) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 9
Probabilistic Similarity Queries • Example: Probabilistic -Range Query • query object and set of uncertain objects (discrete) qi = {q1,…,qM} and oi={oi,1,…,oi,N} • distance between q and oi: • probability that the distance between q and oi is less than 0+: M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 10
Probabilistic Similarity Queries • Clustered Object Representation object o = {o1,..,os} simple object approximation MBR(o) clustered object approximation MBR(C1(o)), .., MBR(Ck(o)) build approximations by grouping vector points of an object into clusters M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 11
Probabilistic Similarity Queries • advantages of clustered object approximation • efficiently managed by spatial access methods • e.g. R-tree, X-tree • supports multi-step query processing • true hits can be reported very early • reduced refinement cost • efficient computation of approximate answers • PTSQ and PTopkSQ efficiently supported M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 12
Probabilistic Similarity Queries • multi-step query processing: • probabilistic filter • Estimation of probability p = P(d(o,q) ≤ ): uncertain object o (clustered object representation) 0.1 0.2 0.2 0.3 0.1 0.3 ≤ P(d(o,q) ≤ ) ≤ 0.6 0.1 lower bounding prob. estimation upper bounding prob. estimation query point q M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 13
0.1 0.2 0.2 0.3 0.1 0.1 query point q Probabilistic Similarity Queries • Filter Step for PSQs: • Probabilistic -Range Queries (PTSQ type): • for each uncertain object o: • compute lower and upper bounding probabilities based on cluster representations • if lower bounding probability Plow > , then report o • if upper bounding probability Pupper < , then prune o • otherwise refine o (partial refinement) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 14
Probabilistic Similarity Queries • Filter Step for PSQs: • Probabilistic k-NN-Queries (PTSQ type) • upper bounding probability that p is NN is Pupper=0.7 Example: 0.1 object o 0.2 0.2 0.3 0.1 0.1 query point q object p M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 15
Outline Introduction Probabilistic Similarity Queries multi-step query processing probabilistic -range/k-NN queries Probabilistic Similarity Ranking probabilistic ranking models efficient computation of probabilistic ranking queries Summary M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 16
Probabilistic Similarity Ranking • Ranking Queries • very important for similarity search applications • give the most relevant answers first • are more flexible than -range and NN queries • probabilistic ranking queries • results are associated with confidence values • in contrast to -range / NN queries • no unique query predicate M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 17
Probabilistic Similarity Ranking • output of probabilistic ranking: • for each object:discrete pdf over ranking positions prob_rankedq: D{1,..,N}→[0..1] • prob_rankedq(o,k) reports the probability that object o is exactly the kth-nearest-neighbor of the query object q probability k 9 10 1 2 3 4 5 6 7 8 M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 18
R T S Q O N M P L K Objects Probability Table J I Probability H F G D C E B A ranking coefficient k Probabilistic Similarity Ranking P O • Example: Probabilistic Ranking Output Q N M F L G E J H I A q B C D S K R T vector space probabilistic ranking output M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 19
Probabilistic Similarity Ranking • probabilistic ranking output is inconvenient for most users • coping with probabilistic ranking: • ranking with unique order and confidences Rank OID Conf. 1. A 0.6 2. B 0.7 3. E 0.2 4. C 0.3 … … … M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 20
Probabilistic Similarity Ranking Rank OID Conf. 1. A 0.6 How should we extract the conf. from the prob. ranking output? Which ranking order? 2. B 0.7 3. E 0.2 4. C 0.3 … … … • probabilistic ranking output is inconvenient for most users • coping with probabilistic ranking: • ranking with unique order and confidences • aggregate conf. values to deterministic results M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 21
Probabilistic Similarity Ranking Result: 1. (A,0.45) | 2. (C,0.40) | 3. (C,0.45) or with duplicate elimination 1. (A,0.45) | 2. (C,0.40) | 3. (B,0.35) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 22 • Approaches for Ranking Objects: • Approach 1: highest confidence [Soliman ICDE’07, Yi ICDE’08] • problem: • duplicates • neglected objects • Example:
Probabilistic Similarity Ranking Result: 1. (A,0.45) | 2. (B,0.35) | 3. (C,0.45) or 1. (A,0.45) | 2. (B,0.75) | 3. (C,1.00) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 23 • Approaches for Ranking Objects: • Approach 2: highest aggregated confidence • object with the highest prob. that it is one of the first k objects is assigned to ranking position k. • sensible with duplicate elimination • Example:
Probabilistic Similarity Ranking • further approaches to determine the ranking order, e.g. • expected ranking position • etc. • most intuitive and robust: Approach 2. • problem: • full probabilistic ranking information is required • required: • efficient computation of prob. ranking output M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 24
Probabilistic Similarity Ranking • Iterative Probability Computation • ranking applied on object vectors (samples) • during the radial sweep: maintain for each object o the probability • for each accessed sample oi,j, compute the probability P(oi,j,k) that exactly (k-1) objects o oiare within the sweep-range , for k = 1..N. radial sweep with increasing range M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 25
Probabilistic Similarity Ranking • computation of P(oi,j,k): • problem: comp. very expensive • a lot of possibilities for i must be reconsidered • 1) Approach: • pruning objects that are beyond : • reduce DBDB‘ (|DB‘|<<|DB|) M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 26
I C H B oi,j q D G A F E Probabilistic Similarity Ranking • applying only relevant objects: A (1.0) B (1.0) F (0.8) D (0.6) H (0.2) C (0.1) E (0.0) G (0.0) N N‘‘ N‘ M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 27
C C C H H H oi,j oi,j oi,j q q q D D D F F F Probabilistic Similarity Ranking • problem: computation still exponential • 2. Approach • problem can be solved in polynomial time by means of dynamic programming technique: P(2 of 4 in -range) P(1 of 3 in -range) assuming C in -range P(2 of 3 in -range) assuming C not in -range M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 28
Probabilistic Similarity Ranking problem: computation still exponential 2. Approach: problem can be solved in polynomial time by means of dynamic programming technique: recursive function: M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 29
Outline Introduction Probabilistic Similarity Queries multi-step query processing probabilistic -range/k-NN queries Probabilistic Similarity Ranking probabilistic ranking models efficient computation of probabilistic ranking queries Summary M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 30
Summary • approaches to accelerate probabilistic similarity queries in vector spaces • assumption: • objects are mutually independent • discrete uncertainty representations • support by • traditional access methods • multi-step query processing techniques • very high speed-up factor using Dyn. Prog. M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 31
Discussion Thank you for your attention .. any questions? M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 32