10 likes | 231 Views
Understanding and Predicting Interestingness of Videos Yu-Gang Jiang , Yanran Wang , Rui Feng , Hanfang Yang, Yingbin Zheng , Xiangyang Xue School of Computer Science, Fudan University, Shanghai, China. AAAI 2013 Bellevue, USA. Two New Datasets. The problem. Results. Flickr Dataset:
E N D
Understanding and Predicting Interestingness of Videos Yu-Gang Jiang, Yanran Wang, RuiFeng, Hanfang Yang, YingbinZheng, XiangyangXue School of Computer Science, Fudan University, Shanghai, China AAAI 2013 Bellevue, USA Two New Datasets The problem Results • Flickr Dataset: • Source: Flickr.com • Video Type: Consumer Videos • Video Number: 1200 • Categories: 15 (basketball, beach…) • Duration: 20 hrs in total • Label: Top 10% as interesting videos; Bottom 10% as uninteresting • YouTube Dataset: • Source: YouTube.com • Video Type: Advertisements • Video Number: 420 • Categories: 14 (food, drink…) • Duration: 4.2 hrs in total • Label: 10 human assessors to compare video pairs Can a computational model automatically analyze video contents and predict the interestingness of videos? We conduct a pilot study on this problem, and demonstrates a simple method to identify more interesting videos. • Visual Feature Results: • Overall the visual features achieve very impressive performance on both datasets • Among five features, SIFT and HOG are very effective, and their combination performs best • Audio Feature Results: • The three audio features are effective and complementary. Comparing them gets best performance • Attribute Feature Results: • Attribute features do not work as well as we expected. Especially style performs poorly. It is a very interesting observation since in the prediction of image interestingness, style is claimed effective • Visual+Audio+Attribute Fusion Results: • Fusing visual and audio features leads to substantial performance gains with 2.6% increase on Flickr and 5.4% increase on YouTube. While adding Attribute features is not that effective Flickr YouTube 76.6 74.5 68.0 67.1 67.0 74.7 Key Idea 64.8 65.7 • Applications: • Web Video Search • Video Recommendation System • RelatedWork: • There is a few studies about predicting Aesthetics and Interestingness of Images • Key Idea is building computational model to predict which video is more interesting, when given two videos. • Contributions: • Conducted a pilot study on video interestingness • Built two new datasets to support this study • Evaluated a large number of features and get interesting observations 64.5 56.8 Prediction & Evaluation • Computational Framework: • Aim: train a model to compare the interestingness of two videos • Feature: • Prediction: • Adopt Joachims’ Ranking SVM (Joachims 2003) to train prediction models • For both datasets, we use 2/3 of the videos for training and 1/3 for testing • Use Kernel-level Fusion & Equal Weights to fuse multiple features. • Evaluation: • Accuracy (the percentage of correctly ranked test video pairs) Multi-modal feature extraction Visual features 2.6% 5.4% 78.6 76.6 71.7 Multi-modal fusion Ranking SVM 68.0 results Audio features VS. High-level attribute features Conclusion VS. • We conducted a study on predicting video interestingness. We also built two new datasets. A great number of features have been evaluated, leading to interesting observations: • Visual and Audio features are effective in predicting video interestingness • A few features useful in image interestingness do not extend to video domain (Style…) Datasets are available at: www.yugangjiang.info/research/interestingness