1 / 29

A Dynamic Probabilistic Multimedia Retrieval Model

A Dynamic Probabilistic Multimedia Retrieval Model. Tzvetanka I. Iane va Arjen P. de Vries Thijs Westerveld. Introduction. Video Re presentation schemes used for retrieval: Static Spatio-temporal

piper
Download Presentation

A Dynamic Probabilistic Multimedia Retrieval Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Dynamic ProbabilisticMultimedia Retrieval Model Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld ICME 2004

  2. Introduction • Video Representation schemes used for retrieval: • Static • Spatio-temporal • Video is a temporal media so a ‘good’ model solves the limitations of keyframe-based shot representation ICME 2004

  3. Spatio-temporal grouping • Spatial priority and tracking of regions from frame to frame • Joint spatial and temporal segmentation • Human vision finds salient structures jointly in space and time (Gepshtein and Kubovy, 2000) ICME 2004

  4. Motivation • Pursue video retrievalinstead of image (keyframe) retrieval • Extension of the Static Probabilistic Multimedia Retrieval model (2003) • GMM in DCT-space-time domain • Diagonal covariance ICME 2004

  5. Docs Models Static Model • Indexing • - Estimate Gaussian Mixture Models from images using EM • - Based on feature vector with colour, texture and position information from pixel blocks • - Fixed number of components ICME 2004

  6. Indexing Estimate a Gaussian Mixture Model from each keyframe (using EM) Fixed number of components (C=8) Feature vectors contain colour, texture, and position information from pixel blocks: <x,y,DCT> Static Model ICME 2004

  7. Static Model Models • Retrieval • Calculate conditional probabilities of query samples given models in collection P(Q|M1) Query P(Q|M2) P(Q|M3) P(Q|M4) ICME 2004

  8. Dynamic Model • Selecting frames • 1 second sequence around the keyframe • Entire video shot as sequence of frames sampled at regular intervals • Features < x, y, t, DCT > ICME 2004

  9. Dynamic Model • Indexing: • GMM of multipleframes around keyframe • Feature vectors extended with time-stamp normalized in [0,1]: <x,y,t,DCT> 1 .5 0 ICME 2004

  10. Dynamic Model ICME 2004

  11. Query example: A single image • Artificial sequence of 29 images as the single query example where the time is normalized between 0 and 1 • Extend the query example image’s features with a fixed temporal feature value of 0.5 – Better results and lower computational cost ICME 2004

  12. Dynamic Model Advantages • More training data for models • Less sensitive to random initialization • Reduced dependency upon selecting appropriate keyframe • Some spatio-temporal aspects of shot are captured • (Dis-)appearance of objects ICME 2004

  13. Dynamic Model ICME 2004

  14. Dynamic Model ICME 2004

  15. Dynamic Model ICME 2004

  16. Retrieval Framework • Smoothing • Building dynamic GMMs Likelihood goes to infinity ??? ICME 2004

  17. Experimental Set-up • Build models for each shot • Static, Dynamic, Language • Build Queries from topics • Construct simple keyword text query • Select visual example • Rescale and compress example images to match video size and quality ICME 2004

  18. Combining Modalities • Independence assumption textual/visual • P(Qt,Qv|Shot) = P(Qt|LM) * P(Qv|GMM) • Combination works if both runs useful [CWI:TREC:2002] • Dynamic run moreuseful than static run ICME 2004

  19. Dynamic: Higher Initial Precision Combining Modalities ICME 2004

  20. Dynamic: Higher initial precision Static run Dynamic run ICME 2004

  21. Dow Jones Topic (120) ICME 2004

  22. “Dow Jones Industrial Average rise day points” Dow Jones Topic (120) + = ICME 2004

  23. Conclusions • Dynamic model captures visual similarity better • Spatio-temporal aspects • More training data • Apropriate key-frame less critical • Less sensitive to the random initialization • ASR + dynamic better than either alone ICME 2004

  24. Future work • More data needs more computation effort – optimizations ? • Avoid the singular solutions Dynamic number of components ? • Full covariance in space-time < x,y,t > • Integration of audio ICME 2004

  25. Thanks !!! ICME 2004

  26. Combining (conflicting) examples difficult [CWI:TREC:2002] Single example  Miss relevant shots Round-Robin Merging Merging Run Results Combined 1 1 2 2 3 3 4 4 . . 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 ICME 2004

  27. Merging Run Results ICME 2004

  28. Combining (conflicting) examples difficult [CWI:TREC:2002] Single example  Miss relevant shots Round-Robin Merging Merging Run Results Combined 1 1 2 2 3 3 4 4 . . 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 ICME 2004

  29. Conclusions • Visual aspects of an information need are best captured by using multiple examples • Combining results for multiple (good) examples in round-robin fashion, each ranked on both modalities, gives near-best performance for almost all topics ICME 2004

More Related