1 / 57

Summarization of ego-centric video -Object driven Vs. Story driven

Summarization of ego-centric video -Object driven Vs. Story driven. Presented By: Elad Osherov Jan 2013. Today’s talk. Motivation Related Work Object driven summarization Story driven summarization Results Future Development. What is Egocentric Video Anyway ? . http://xkcd.com/1235/.

bandele
Download Presentation

Summarization of ego-centric video -Object driven Vs. Story driven

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarization of ego-centric video -Object driven Vs. Story driven Presented By: Elad Osherov Jan 2013

  2. Today’s talk • Motivation • Related Work • Object driven summarization • Story driven summarization • Results • Future Development

  3. What is Egocentric Video Anyway ? http://xkcd.com/1235/

  4. What is Egocentric Video Anyway ?

  5. Motivation • Goal - Generate a visual summary of an unedited egocentric video Input: Egocentric video of camera wearer’s day Output: Storyboard (or skim video) summary

  6. Potential Applications of Egocentric Video Summarization Mobile robot discovery Law enforcement Memory aid

  7. Egocentric Video Properties • Long unedited video • Constant head motion – blur • Moving camera – unstable background • Frequent changes in people and objects • Hand occlusion

  8. Today’s talk • Motivation • Related Work • Object driven summarization • Story driven summarization • Results • Future Development

  9. Related Work • Object recognition in egocentric video [Egocentric Recognition of Handled Objects: Benchmark and Analysis X.Ren, M.Philipose -CVPR 2009] • Detection and recognition of first person actions [Detecting activities of daily living in first-person camera views H.Pirsiavash, D.Ramanan CVPR 2012] • Data summarization – Today ! [Rav-Acha, Y. Pritch, and S. Peleg, Making a Long Video Short: Dynamic Video Synopsis, CVPR 06]

  10. Related Work [ Rav-Acha, Y. Pritch, and S. Peleg, Making a Long Video Short: Dynamic Video Synopsis, CVPR 06 ] http://www.vision.huji.ac.il/video-synopsis/

  11. A Few Words About the Authors • Discovering Important People and Objects for Egocentric Video Summarization. Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman CVPR 2012 • Story-Driven Summarization for Egocentric Video. Zheng Lu and Kristen Grauman CVPR 2013 Dr. Yong Jae Lee UC Berkeley (departments of EE & CS) Prof. Joydeep Ghosh University of Texas at Austin. Director of IDEAL (Intelligent Data Exploration and Analysis Lab) Prof. ZhengLu City university of Hong Kong (department of CS) Prof. Kristen Grauman University of Texas at Austin (department of CS)

  12. Today’s talk • Motivation • Related Work • Object driven summarization • Story driven summarization • Results • Future Development

  13. Object Driven Video Summarization • Goal - create a storyboard summary of a person’s day that is driven by the important people and objects • Important things - significant interaction • Several problems arise • Important is a subjective index ! • What does significant interaction really mean ? • No priors on People and objects

  14. Algorithm Overview • Train a category-independent important person/object detector Test Test Test Train [Discovering Important People and Objects for Egocentric Video Summarization. Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman CVPR 2012]

  15. Annotating Important Regions in Training Video • Data collection – • 10 videos each 3-5 hours long-total of 37 hrs • 4 subjects • Crowd source annotations using Mturk • Object’s degree of importance will highly depend on what the camera wearer is doing before, while andafter the object/person appears • The object must be seen in the context of the camera wearer’s activity to properly gauge its importance www.looxcie.comwww.mturk.com/mturk/

  16. Annotating Important Regions in Training Video Man wearing a blue shirt in a Café Yellow notepad on a table Coffee mug that cameraman drinks Smartphone the cameraman holds • For about 3-5 hours of video they get 700 object segmentations

  17. Training a Regression Model • General purpose category-independentmodel predicts important regions in any egocentric video: • Segment each frame into regions • For each region, compute a set of candidate features that could describe it’s importance • Egocentric, Object & Region features • Train a regressor to predict region importance

  18. Egocentric Features • Interaction feature – • Euclidean distance of the region’s centroid to the closest detected hand • Classify region as a hand according to color likelihoods and a naïve bayes classifier trained on ground-truth hand annotations Distance to hand

  19. Egocentric Features • Gaze feature – • A coarse estimate of how likely the region is being focused upon • Euclidean distance of the region’s centroid to the frame center Distance to frame center

  20. Egocentric Features • Frequency feature – Region matching - Color dissimilarity between the region and each region in surrounding frames Points matching - Match SIFT features between each region and frame in surrounding frames Frequency

  21. Object Features • Object-like appearance • Using region ranking function that ranks each region according to Gestalt cues: [J. Carreira and C. Sminchisescu. Constrained Parametric Min-Cuts for Automatic Object Segmentation. In CVPR, 2010.]

  22. Object Features • Object-like motion • Rank each region according to the difference of motion patterns in comparison to the nearby regions • High scores to regions that “stand-out” of their surroundings during motion Object-like motion [Key-Segments for Video Object Segmentation Yong Jae Lee, Jaechul Kim, and Kristen Grauman ICCV 2011]

  23. Object features • Likelihood of a person’s face • Compute the maximum overlap score between the region r and any detected face q in the frame Overlap with face detection

  24. Train a regressor to predict region importance • Size, centroid, bounding box centroid, bounding box, width, bounding box height – Region features • Solve using least squares

  25. Algorithm Overview • Train a category-independent important person/object detector Test Test Test Train [Discovering Important People and Objects for Egocentric Video Summarization. Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman CVPR 2012]

  26. Segmenting the video into temporal events Pair-wise distance matrix • Events allow summary to include multiple instances of the person or object that is central in multiple contexts in the video • Group frames until the smallest maximum inter-frame distance is larger than two STDs beyond the mean

  27. Algorithm Overview • Train a category-independent important person/object detector Test Test Test Train [Discovering Important People and Objects for Egocentric Video Summarization. Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman CVPR 2012]

  28. Discovering an Event’s Key People and Objects • Score each frame region using the regressor • Group instances of the same object/person together • Set a pool of high scoring clusters • Remove clusters with affinity to a higher I(r) cluster • For each remaining cluster select the region with the highest importance as its representative

  29. Generating a Storyboard Summary • Each event can display different number of frames, depending on how many unique important things the method discovers

  30. Results Important region prediction accuracy

  31. Results Important region prediction accuracy

  32. Results Which cues matter most for predicting importance ? Top 28 features with highest learned weights • Low scores on • Interaction and frequency pair • Object-like region that is frequent

  33. Results Egocentric video summarization accuracy

  34. Results User studies to evaluate summaries • Let the camera wearer answer 2 quality questions: • Important objects/people captured • Overall summary quality • Better results in ~69% of the summaries

  35. Today’s talk • Motivation • Related Work • Object driven summarization • Story driven summarization • Results • Future Development

  36. Story Driven Video Summarization • Good summary captures the progress of the story! • Segment video temporally into subshots • Select chain of k subshots that maximize both weakest link’s influence and object importance • Each subshot”leads to” the next through some subset of influentialobjects [Story-Driven Summarization for Egocentric Video. Zheng Lu and Kristen Grauman CVPR 2013]

  37. Document – Document Influence[Shahaf & Guestrin, KDD 2010] Connecting the dots between news articles. D. Shahaf and C. Guestrin. In KDD, 2010.

  38. Egocentric SubshotDetection • Define 3 generic ego-activities • Static • In transit • Head moving • Train classifiers to predict these activity types • Features based on Blur and Optical flow • Classify using SVM classifier

  39. Temporal Subshot Segmentation Tailored to egocentric video – detects ego-activities Provides an over-segmentation - A typical subshot lasts ~15 Sec

  40. Subshot Selection Objective • Given a set series of subshots segmented from the input video, our goal is to select the optimal K-node chain of subshots

  41. Story Progress Between Subshots • A good story – a coherent chain of subshots, where each strongly influences the next one

  42. Predicting Influence Between Subshots 0.2 0.003 0.01 0.1 0.1 0.2 0.1

  43. Predicting Influence Between Subshots Sink node • Captures how reachable subshotj is from subshoti, via object o.

  44. Subshot Selection Objective • Given a set series of subshots segmented from the input video, our goal is to select the optimal K-node chain of subshots

  45. Predicting diversity among transitions • Compute GIST and color histograms for each frame in each subshot, quantize them into 55 scene types • Compute for each two adjacent subshots in the chain

  46. Coherent Object Activation Patterns • Prefer activating few objects at once and, coherent (smooth) entrance/exit patterns • Solve with linear programing and priority queue Story driven Uniform sampling

  47. Today’s talk • Motivation • Related Work • Object driven summarization • Story driven summarization • Results • Future Development

  48. Results 20 videos, each 20-60 minutes, daily activities in house 4 videos, each 3-5 hours long, uncontrolled setting

  49. Results • Baselines • Uniform sampling of K subshots • Shortest path – K subshots with minimal bag-of-objects distance between each other • Object driven – Only for UTE set • Parameters • K=4...8 • Simultaneous active objects : 80-UTE 15-ADL

  50. Results • Test methodology • 34 human subjects, ages 18-60 • 12 hours of original video • Each comparison done by 5 subjects • Total 535 tasks, 45 hours of subject time • Probably the most comprehensive egocentric summarization test ever established!

More Related