1 / 1

Understanding and Predicting Interestingness of Videos

Understanding and Predicting Interestingness of Videos Yu-Gang Jiang , Yanran Wang , Rui Feng , Hanfang Yang, Yingbin Zheng , Xiangyang Xue School of Computer Science, Fudan University, Shanghai, China. AAAI 2013 Bellevue, USA. Two New Datasets. The problem. Results. Flickr Dataset:

qamar
Download Presentation

Understanding and Predicting Interestingness of Videos

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding and Predicting Interestingness of Videos Yu-Gang Jiang, Yanran Wang, RuiFeng, Hanfang Yang, YingbinZheng, XiangyangXue School of Computer Science, Fudan University, Shanghai, China AAAI 2013 Bellevue, USA Two New Datasets The problem Results • Flickr Dataset: • Source: Flickr.com • Video Type: Consumer Videos • Video Number: 1200 • Categories: 15 (basketball, beach…) • Duration: 20 hrs in total • Label: Top 10% as interesting videos; Bottom 10% as uninteresting • YouTube Dataset: • Source: YouTube.com • Video Type: Advertisements • Video Number: 420 • Categories: 14 (food, drink…) • Duration: 4.2 hrs in total • Label: 10 human assessors to compare video pairs Can a computational model automatically analyze video contents and predict the interestingness of videos? We conduct a pilot study on this problem, and demonstrates a simple method to identify more interesting videos. • Visual Feature Results: • Overall the visual features achieve very impressive performance on both datasets • Among five features, SIFT and HOG are very effective, and their combination performs best • Audio Feature Results: • The three audio features are effective and complementary. Comparing them gets best performance • Attribute Feature Results: • Attribute features do not work as well as we expected. Especially style performs poorly. It is a very interesting observation since in the prediction of image interestingness, style is claimed effective • Visual+Audio+Attribute Fusion Results: • Fusing visual and audio features leads to substantial performance gains with 2.6% increase on Flickr and 5.4% increase on YouTube. While adding Attribute features is not that effective Flickr YouTube 76.6 74.5 68.0 67.1 67.0 74.7 Key Idea 64.8 65.7 • Applications: • Web Video Search • Video Recommendation System • RelatedWork: • There is a few studies about predicting Aesthetics and Interestingness of Images • Key Idea is building computational model to predict which video is more interesting, when given two videos. • Contributions: • Conducted a pilot study on video interestingness • Built two new datasets to support this study • Evaluated a large number of features and get interesting observations 64.5 56.8 Prediction & Evaluation • Computational Framework: • Aim: train a model to compare the interestingness of two videos • Feature: • Prediction: • Adopt Joachims’ Ranking SVM (Joachims 2003) to train prediction models • For both datasets, we use 2/3 of the videos for training and 1/3 for testing • Use Kernel-level Fusion & Equal Weights to fuse multiple features. • Evaluation: • Accuracy (the percentage of correctly ranked test video pairs) Multi-modal feature extraction Visual features 2.6% 5.4% 78.6 76.6 71.7 Multi-modal fusion Ranking SVM 68.0 results Audio features VS. High-level attribute features Conclusion VS. • We conducted a study on predicting video interestingness. We also built two new datasets. A great number of features have been evaluated, leading to interesting observations: • Visual and Audio features are effective in predicting video interestingness • A few features useful in image interestingness do not extend to video domain (Style…) Datasets are available at: www.yugangjiang.info/research/interestingness

More Related