Discovering Important People and Objects for Egocentric Video Summarization

Discovering Important People and Objects for Egocentric Video Summarization Yong Jae Lee, JoydeepGhosh, and Kristen Grauman University of Texas at Austin

Outline Introduction Approach Results Conclusion

Introduction

Introduction Focus on the most important objects and people with which the camera wearer interacts. Develop region cues indicative of high-level saliency feature in egocentric video Learn a regressor to predict the relative importance of any new region based on the cues.

Approach Train a regression model to predict region importance Segment the video into temporal events Scoring each region ' s importance using the regressor Generate a storyboard summary of important people /objects important things are those with which the camera wearer has significant interaction. four main steps:

Egocentric video data collection We use the Looxcie wearable camera, which captures video at 15 fps at 320 x 480 resolution. We collected 10 videos, each of 3-5 hours in length.

Annotating important regions in training video

Learning region importance in egocentric video • Egocentric features • Interaction • Gaze • Frequency

Learning region importance in egocentric video [19] D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints.IJCV, 60(2), 2004. • Frequency feature • Matching regions • Matching points(DoG+SIFT)[19]

Learning region importance in egocentric video [3] Constrained Parametric Min-Cutsfor Automatic Object Segmentation. In CVPR, 2010. [16]Key-Segments for Video ObjectSegmentation. In ICCV, 2011. [27]Rapid Object Detection using a Boosted Cascadeof Simple Features. In CVPR, 2001. • Object features • object-like appearance[3] • object-like motion[16] • likelihood of a person's face[27] • Region features • size、centroid • bounding box centroid、width、height

Regressor to predict region importance learned parameters i’thfeature value For training: ; for testing: predict given Training a linear regression model with pair-wise interaction terms to predict a region r'simportance score:

Segmenting the video into temporal events where

Discovering an event’s key people and objects

Results Evaluate on videos from all 4 users, total 17 hours.Train using data from 3 users and test on 1 video from remaining user.

Important region prediction accuracy [3] Constrained Parametric Min-Cuts for Automatic Object Segmentation. In CVPR, 2010. [6]Category Independent Object Proposals. In ECCV, 2010. [28] Modeling Attention to Salient Proto-Objects. Neural Networks, 19:1395–1407, 2006.

The highest learned weights

User study results

Application example

Conclusion A novel approach to perform summarization for egocentric video. Focus on the most important objects and people that generate the " story " of vedio. Novel egocentric features to train a regressor that predicts important regions.

Discovering Important People and Objects for Egocentric Video Summarization

Discovering Important People and Objects for Egocentric Video Summarization

Presentation Transcript

Describing locations of people and objects:

IMPORTANT PEOPLE

Important people

IMPORTANT PEOPLE

Video Summarization via Transferrable Structured Learning

Discovering Ways Animals Help People

Important People

Detecting and Tracking Moving Objects for Video Surveillance

egocentric

Video Information Summarization and Testbed

Egocentric View Transition for Video Monitoring in a Distributed Camera Network

Important People and Events

A fuzzy video content representation for video summarization and content-based retrieval

IMPORTANT PEOPLE

Automatic Soccer Video Analysis and Summarization

Video-based learning objects for teaching HCI

Discovering Objects and their Location in Images

Discovering Objects and their Location in Images

Project Synergy Discovering Learning Objects

Video summarization by video structure analysis and graph optimization

Discovering Ways Animals Help People