1 / 34

Video Search Engines and Content-Based Retrieval

Video Search Engines and Content-Based Retrieval. Steven C.H. Hoi CUHK, CSE 18-Sept, 2006. Outline. Video Search Engines Content-Based Video Retrieval. Video Search Engines A survey of state-of-the-arts. Introduction. Who are doing video search engines?. Top text search engines

avery
Download Presentation

Video Search Engines and Content-Based Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

  2. Outline • Video Search Engines • Content-Based Video Retrieval

  3. Video Search EnginesA survey of state-of-the-arts

  4. Introduction • Who are doing video search engines? Top text search engines 5.6 billion searches 07/2006

  5. Introduction • Google

  6. Introduction • Yahoo

  7. Introduction • MSN/Live Search

  8. Introduction • YouTube

  9. Business Models • Web Advertising • Site Volume, or keyword customized • Video Ads • Disable controls (MSN) • Subscription • MLB, Real • Download to own • iTunes, Movie • Rental • Limited time, number of plays • Other • Desktop Media Search • Media player (jukebox) • Media Monitoring • Media Asset Management

  10. Types of video Sites • Content Originators • Major Broadcasters • Affiliates, Local News • Major League Baseball • Syndication, Aggregation, “Internet Broadcasters” • Rental, purchase, advertising, subscription • MSN, Google, iTunes • ROO Media, FeedRoom • Movie and Video Download • Share portals • Consumer content, blogs • YouTube, Putfile, Vsocial, Google, Akimbo • Traditional Search Engines (Crawl) / “RSS” • Yahoo, Blinkx • Other • Public (Internet Archive) • Media Monitoring, asset management systems

  11. Video Search Challenges

  12. Current Video Search Engines Metadata • File type and context • Media file attributes • Size, length • Structured global metadata • RSS content description Content • Content Indexing • Search within a video • Full text of dialog • Image or video content • Automated Content Indexing

  13. Current Video Search Engines • Content Search Engines Keyword search with transcripts from speech recognition

  14. Content-Based Video Search Engine • Architecture

  15. Content-Based Video Search Engine • Video Processing

  16. Content-Based Video Search Engine • Research Challenges • Speech Recognition • Shot Boundary Detection • Video Story Segmentation • Concept Detection • Multi-modal Fusion for Ranking • Text/ASR, Audio/Speech, Visual, etc.

  17. Content-Based Retrieval • Our Research Problem • Learning to rank video shots for automatic content-based search tasks ! • Challenges • Multi-Modal Information Fusion • Small Sample Learning (a few pos. & no neg.) • Learning on large-scale datasets

  18. Multi-modal and Multi-scale Ranking Framework • Main Ideas • Representing video structures by graphs • Using semi-supervised learning to address small labeled sample learning problem • Fusing Multi-modal information by Harmonic learning over graphs • Multi-scale ranking for achieving efficient performance on large-scale datasets

  19. Multi-modal and Multi-scale Ranking Framework • Graph-based Modeling Shot StoryText chinese state president hu jintao arrived at the located sao paulo suburbs... place of innai natural china in this country policies and regulations... hu jintao during his speech said in argentina are latin america in the region ...

  20. Multi-modal and Multi-scale Ranking Framework • Semi-Supervised Learning on Graph • To find an optimal real-valued function g: VR on the graph G • To minimize a quadratic energy function: • Using Gaussian field and Harmonic property of Spectral Graph Theory (J. Zhu’s ICML’03), a harmonic function g can be found:

  21. Multi-modal and Multi-scale Ranking Framework • Semi-Supervised Learning on Graph • Let • The solution of the harmonic function g can be expressed in matrix operations:

  22. Multi-modal and Multi-scale Ranking Framework • Multi-Modal Fusion over Graph • To combine text information into SSL on visual modality, we consider the text inputs as the attached nodes on the visual graph: Visual - g Text - f

  23. Multi-modal and Multi-scale Ranking Framework • Challenges • Number of examples in database: N is large • For examples: • TRECVID 2005: Rep. Key-Frames N = 45,765 • TRECVID 2006: Rep. Key-Frames N = 79,487 • How to do Semi-Supervised Learning?!

  24. Multi-modal and Multi-scale Ranking Framework • Multi-Scale Ranking • Learning ranking through multi-scale reranking • Each stage is associated with different computational costs • In our solution, four ranking stages include: • Ranking by Text Retrieval using Language Models • Re-ranking by NN fusing Text and Visual • Re-ranking by SVM fusing Text and Visual • Re-ranking by multi-modal Semi-supervised Learning

  25. return top K shots Top N4 related Shots User’s Query SSR Semi-Supervised Ranking Raw Video Clips / Streams Top N3 related Shots SVM/KLR Supervised Ranking Top N2 related Shots Multi-scale Ranking Text + Visual NN Top N1 related Shots Top M related Stories Text Text Processing Video Stories Video Shots VideoProcessing Image Processing Multi-modal Fusion

  26. Benchmark Evaluations • Dataset • TRECVID 2005 • Test: 140 video clips, 45,765 rep. key frames • 24 queries • A query example: <videoTopic num="0152"> <textDescription text="Find shots of Hu Jintao, president of the People's Republic of China" /> </videoTopic>

  27. Benchmark Evaluations • Text-only Retrieval • No Pseudo-Relevance Feedback (No-PRF) • With Pseudo-Relevance Feedback (PRF) Language Models • TF-IDF • Okapi • KL-JM • KL-DIR • KL-ABS

  28. Benchmark Evaluations • Visual Features • Color • Grid Color Moment • 3*3 grid, 81-dimensions • Edge • Edge Direction Histogram • 36 bin+1, 37-dimensions • Texture • Gabor Moments • 5*8=40, 3 moments,120 dimensions • 238 dimensions in total COREL Benchmark Photos

  29. Benchmark Evaluations • Multi-modal Retrieval (Text + Visual) • Text-only retrieval • Text + NN (Text + Visual) • Text + SVM (Text + Visual) • MMMS (Text + Visual)

  30. Benchmark Evaluations • Evaluation Results Average Performance on TRECVID 2005 Dataset

  31. Benchmark Evaluations • Comparison with other approaches Average performance of 24 queries

  32. Related Work • IBM Solution • SVM + NN + Multiple Instance Learning • Columbia solution • Information-Theoretical Clustering Approach • CMU Solution • Query-Class Dependent Weighting Ranking

  33. Conclusion • A tutorial of video search engines • Research contributions • A Unified framework of Multi-Modal and Multi-Scale Ranking for video retrieval • Graph-based Modeling of video structures • Semi-Supervised Learning for Multimodal Ranking • Making SSL practical for large-scale problems • Promising empirical results…

  34. Future Work • Research is in progress, tough ahead… • Any suggestions or comments?

More Related