1 / 37

Multimedia DBs

Multimedia DBs. 29. 28. 27. 26. 25. 24. 23. 0. 50. 100. 150. 200. 250. 300. 350. 400. 450. 500. Time Series Data. A time series is a collection of observations made sequentially in time. 25.1750 25.1750 25.2250 25.2500 25.2500 25.2750 25.3250

Download Presentation

Multimedia DBs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimedia DBs

  2. 29 28 27 26 25 24 23 0 50 100 150 200 250 300 350 400 450 500 Time Series Data A time series is a collection of observations made sequentially in time. 25.1750 25.1750 25.2250 25.2500 25.2500 25.2750 25.3250 25.3500 25.3500 25.4000 25.4000 25.3250 25.2250 25.2000 25.1750 .. .. 24.6250 24.6750 24.6750 24.6250 24.6250 24.6250 24.6750 24.7500 value axis time axis

  3. PAA and APCA • Feature extraction for GEMINI: • Fourier • Wavelets • Another approach: segment the time series into equal parts, store the average value for each part. • Use an index to store the averages and the segment end points

  4. X X X X' X' X' SVD DFT DWT eigenwave 0 0 Haar 0 eigenwave 1 1 0 0 0 20 20 20 80 80 80 100 100 100 40 40 40 140 140 140 60 60 60 120 120 120 Haar 1 2 eigenwave 2 Haar 2 3 eigenwave 3 Haar 3 4 eigenwave 4 5 Haar 4 6 eigenwave 5 Haar 5 7 eigenwave 6 Haar 6 eigenwave 7 Haar 7 Feature Spaces Korn, Jagadish, Faloutsos 1997 Chan & Fu 1999 Agrawal, Faloutsos, Swami 1993

  5. sv6 sv1 value axis sv7 sv5 sv4 sv2 sv3 sv8 time axis Piecewise Aggregate Approximation (PAA) Original time series (n-dimensional vector) S={s1, s2, …, sn} n’-segment PAA representation (n’-d vector) S = {sv1 ,sv2, …, svn’} PAA representation satisfies the lower bounding lemma (Keogh, Chakrabarti, Mehrotra and Pazzani, 2000; Yi and Faloutsos 2000)

  6. sv6 sv1 sv7 sv5 sv4 sv2 sv3 sv8 Adaptive Piecewise Constant Approximation (APCA) sv3 n’/2-segment APCA representation (n’-d vector) S= { sv1, sr1, sv2, sr2, …, svM , srM } (M is the number of segments = n’/2) sv1 sv2 sv4 sr1 sr2 sr3 sr4 Can we improve upon PAA? n’-segment PAA representation (n’-d vector) S = {sv1 ,sv2, …, svN}

  7. Reconstruction error PAAReconstruction error APCA APCA approximates original signal better than PAA Improvement factor = 3.77 1.69 1.21 1.03 3.02 1.75

  8. APCA Representation can be computed efficiently • Near-optimal representation can be computed in O(nlog(n)) time • Optimal representation can be computed in O(n2M) (Koudas et al.)

  9. Exact (Euclidean) distance D(Q,S) S Q S S Q Q’ DLB(Q’,S) D(Q,S) D(Q,S) DLB(Q’,S) Distance Measure Lower bounding distance DLB(Q,S)

  10. R1 R1 R3 R2 R4 S2 S5 S3 R3 S1 S4 S6 R4 R2 R3 R2 S8 R4 S9 S8 S7 S9 S1 S2 S3 S4 S5 S6 S7 2M-dimensional APCA space Index on 2M-dimensional APCA space Any feature-based index structure can used (e.g., R-tree, X-tree, Hybrid Tree)

  11. MINDIST(Q,R2) MINDIST(Q,R3) R1 S5 S2 R3 S3 S1 S4 Q S6 MINDIST(Q,R4) R2 S8 R4 S9 S7 k-nearest neighbor Algorithm • For any node U of the index structure with MBR R, MINDIST(Q,R) £ D(Q,S) for any data item S under U

  12. smax3 smax1 smax2 smax4 smin1 smin3 smin2 smin4 Index Modification for MINDIST Computation APCA point S= { sv1, sr1, sv2, sr2, …, svM, srM } R1 S2 S5 sv3 R3 S3 S1 S6 S4 sv1 R2 S8 R4 sv2 S9 sv4 S7 sr2 sr3 sr1 sr4 APCA rectangle S= (L,H) where L= { smin1, sr1, smin2, sr2, …, sminM, srM } and H = { smax1, sr1, smax2, sr2, …, smaxM, srM }

  13. REGION 2 H= { h1, h2, h3, h4 , h5, h6 } h3 value axis l3 h1 l1 h5 REGION 3 l5 REGION 1 l2 l4 h4 l6 h2 h6 L= { l1, l2, l3, l4 , l5, l6 } time axis MBR Representation in time-value space We can view the MBR R=(L,H) of any node U as two APCA representations L= { l1, l2, …, l(N-1), lN }and H= { h1, h2, …, h(N-1), hN }

  14. REGION i h(2i-1) l(2i-1) h2i l(2i-2)+1 REGION 2 h3 l3 h1 value axis REGION 3 h5 l1 l5 REGION 1 l2 l4 h4 h6 h2 l6 time axis Regions M regions associated with each MBR; boundaries of ith region:

  15. t1 t2 Regions • ith region is active at time instant t if it spans across t • The value st of any time series S under node U at time instant t must lie in one of the regions active at t (Lemma 2) REGION 2 h3 value axis l3 h1 REGION 3 h5 l1 l5 REGION 1 l2 l4 h4 h6 h2 l6 time axis

  16. t1 MINDIST(Q,R) = MINDIST Computation For time instant t, MINDIST(Q, R, t) = minregion G active at t MINDIST(Q,G,t) MINDIST(Q,R,t1) =min(MINDIST(Q, Region1, t1), MINDIST(Q, Region2, t1)) =min((qt1 - h1)2 , (qt1 - h3)2 ) =(qt1 - h1)2 REGION 2 h3 l3 h1 REGION 3 h5 l1 l5 REGION 1 l2 l4 h4 h6 h2 l6 Lemma3: MINDIST(Q,R) £ D(Q,C) for any time series C under node U

  17. Approximate Search • A simpler definition of the distance in the feature space is the following: • But there is one problem… what? DLB(Q’,S)

  18. Multimedia dbs • A multimedia database stores also images • Again similarity queries (content based retrieval) • Extract features, index in feature space, answer similarity queries using GEMINI • Again, average values help!

  19. Images - color what is an image? A: 2-d array

  20. Images - color Color histograms, and distance function

  21. Images - color Mathematically, the distance function is:

  22. Problem: ‘cross-talk’: Features are not orthogonal -> SAMs will not work properly Q: what to do? A: feature-extraction question Images - color

  23. possible answers: avg red, avg green, avg blue it turns out that this lower-bounds the histogram distance -> no cross-talk SAMs are applicable Images - color

  24. Images - color time performance: seq scan w/ avg RGB selectivity

  25. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? Images - shapes

  26. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? A: divide by standard deviation) Images - shapes

  27. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? Images - shapes

  28. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? A1: turning angle A2: dilations/erosions A3: ... ) Images - shapes

  29. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? Images - shapes

  30. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? A: Karhunen-Loeve (= centered PCA/SVD) Images - shapes

  31. Performance: ~10x faster Images – shapes log(# of I/Os) all kept # of features kept

  32. Is d(u,v) = sqrt ((u-v)TA(u-v) ) a metric? • xTAx = Σ xixjAij = Σλixi2 • λi is the ith eigenvalue • xi is the projection of x along the ith eigenvector • d(u,v) = sqrt ((u-v)TA(u-v) ) = sqrt (Σλi(ui-vi)2 ) • d(u,v) >= 0, d(u,u) = 0, d(u,v) = d(v,u) • d(u,w) <= d(u,v) + d(v,w), provided • sqrt (Σ λi(ui-wi)2 ) <= sqrt (Σ λi(ui-vi)2 ) + sqrt(Σ λi(vi-wi)2 ) • sqrt(Σ (√λi ui- √λiwi)2 ) <= sqrt(Σ (√λiui- √λivi)2 ) + sqrt(Σ(√λivi- √λiwi)2 ) • Metric condition for Lp norm

  33. Filtering in QBIC • Histogram column vectors x, y of length n • Σ xi = 1, Σ yi = 1 • Difference z = (x-y) • Σ zi = 0 • Contributionof each color bin to a smaller set of colors: • VT = (c1, c2,.., cn), each ci is a column vector of length 3 • xavg = VT x, yavg = Vty, column vectors of length 3

  34. Filtering in QBIC • Distances • davg2 = (xavg - yavg)T(xavg - yavg) = (VT z)T(VT z) = zTVVt z = zTWz • dhist2 = zTAz • dhist2 >= λ1davg2 , where λ1 is the smallest eigenvalue of • A’z = λW’z

  35. Filtering in QBIC • Rewrite z to remove the extra condition that Σ zi = 0. • z’ becomes a (n-1) dimensional column vector • zTAz = z’TA’z’ and zTWz = z’TW’z’ • A’ and W’ are (n-1)x(n-1) matrices • Show that z’TA’z’ >= λ1z’TW’z’

  36. Proof of z’TA’z’ >= λ1z’TW’z’ • Minimize wrt z’, z’TA’z’, subject to the constraint z’TW’z’ = C. • Same as minimizing wrt z’, • z’TA’z’ - λ(z’TW’z’ - C) • Differentiate wrt z and set to 0 • A’z’ = λW’z’ • λ and z’ must be eigenvalues and eigenvectors resp. of • A’z’ = λW’z’

  37. Proof of z’TA’z’ >= λ1z’TW’z’ • z’TA’z’ = λz’TW’z’ = λC • To minimize z’TA’z’ , we must choose the smallest eigenvalue λ1. • The minimization of z’TA’z’, under z’, subject to the constraint z’TW’z’ = C equals λ1C • If z’TW’z’ = C > 0 then • z’TA’z’ >= λ1C • If z’TW’z’ = 0 then • z’TA’z’ >= 0, A’ is positive semi-definite

More Related