1 / 31

Nonchronological Video Synopsis and Indexing

Nonchronological Video Synopsis and Indexing. TPAMI 2008 Yael Pritch , Alex Rav-Acha , and Shmuel Peleg , Member, IEEE. Outline. Introduction Related Work on Video Abstraction Synopsis by Energy Minimization Object-based Synopsis Synopsis of Endless Video Limitations and Failures

vilina
Download Presentation

Nonchronological Video Synopsis and Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and ShmuelPeleg, Member, IEEE

  2. Outline • Introduction • Related Work on Video Abstraction • Synopsis by Energy Minimization • Object-based Synopsis • Synopsis of Endless Video • Limitations and Failures • Demo

  3. Introduction • Video browsing and retrieval is time consuming, most captured video is never watched or examined. • Video synopsis provides a short video representation, while preserving the essential activities of the original video. • The activity in the video is condensed into a shorter period by simultaneously showing multiple activities, even when they originally occurred at different times. • The synopsis video is also an index of the original video by pointing to the original time of each activity.

  4. Introduction • The properties of video synopsis: • The video synopsis should be shorter than the original video. • The video synopsis is also a video, expressing the dynamics of the scene. • Reduce as much as spatiotemporal redundancy as possible, the relative timing between activities may change. • Visible seams and fragmented objects should be avoided.

  5. Introduction • Video synopsis can make surveillance cameras and webcams more useful by giving the viewer summaries. • Synopsis server analyzes the live video feed of interesting events and record the object-based description of the video. • The description includes duration, location , appearance. • For example: • Event飛機起飛 • Object飛機 • Description 起飛的時間, 機場, 飛機的外觀 • In a 3D space-time description of the video, object is represented by a “tube”.

  6. Introduction Input video Video synopsis Fig. 2. Basic temporal rearrangement of object. Objects of interest are defined and viewed as tubes in the space-time volume. Two objects recorded at different times are shifted to the same time interval in the shorter video synopsis. A single object moving during a long time is broken into segments having a shorter duration and those segments are shifted in time and played simultaneously. Intersection of objects does not disturb the synopsis when object tubes are broken into segments. Fig. 1. The input video shows a walking person and, after a period of inactivity, displays a flying bird. A compact video synopsis can be produced by playing the bird and the person simultaneously.

  7. Related Work on Video Abstraction • Fast forwarding • Individual frames or groups of frames are skipped in fixed or adaptive intervals. • Simple but only complete frames can be removed. • Video condensation ratio is relatively low. • Video summarization • Key frames are extracted and usually presented simultaneously as a storyline. • Loses the dynamics of the original video

  8. Related Work on Video Abstraction • Video montage • Spatial and temporal shifts are applied to object to create a video summary. • This paper uses only temporal transformations, keeping spatial location intact. Fig. 3. Comparison between “video montage” and our approach . (a) A frame from a “video montage.” Two space-time regions were shifted in both time and space and then stitched together. Visual seams between the different regions are unavoidable. (b) A frame from a “video synopsis.” Only temporal shifts were applied, enabling seamless stitching.

  9. Synopsis by Energy Minimization • Any synopsis pixel S(x,y,t) can come from an input pixel I(x,y,M(x,y,t)). The time shift M is obtained by minimizing the following cost function: • Ea(M) : activity, indicates the loss in activities. • Activity measure: difference from the background • Ed(M) : discontinuity, indicates the sum of color difference across seams between spatiotemporal neighbors. • ei : the six unit vectors representing the six temporal neighbors. Four spatial and two temporal neighbors.

  10. Synopsis by Energy Minimization Fig. 4 (a) The shorter video synopsis S is generated from the input video I by including most active pixels together with their spatiotemporal neighborhood. To assure smoothness, when pixel A in S corresponds to pixel B in I, their “cross border” neighbors in space as well as in time should be similar.

  11. Synopsis by Energy Minimization • A seam exists between two neighboring locations (x1,y1) and (x2,y2) in S if M (x1,y1) != M (x2,y2) . • ei : four unit vectors describing the four spatial neighbors. • K : # of frames in the output. • N : # of frames in the input. Fig. 4. (b) An approximate solution can be obtained by restricting consecutive synopsis pixels to come from consecutive input pixels.

  12. Object-based Synopsis • Low-level approach for video synopsis as described earlier is limited to satisfying local properties such as avoiding visible seams. • Higher level object-based properties can be incorporated when objects can be detected and tracked. • To enable segmentation of moving foreground objects, we start with background construction. • For short video clips, using a temporal median over entire clip. • For surveillance cameras, using a temporal median over a few minutes before and after each frames.

  13. Object-based Synopsis Fig. 6. Background images from a surveillance camera at Stuttgart airport. The bottom images are at night, while the top images are in daylight. Parked cars and parked airplanes become part of the background.

  14. Object-based Synopsis [32] J. Sun, W. Zhang, X. Tang, and H. Shum, “Background Cut,” Proc. Ninth European Conf. Computer Vision, pp. 628-641, 2006. • Use a simplification of [32] to compute the space-time tubes representing dynamic objects. • High quality and real time. • A single video sequence with a moving foreground object and stationary background, using background subtraction, color and contrast cues to extract a foreground accurately and efficiently. • Each tube b is represented by its characteristic function: • tb : the time interval in which this object exists.

  15. Object-based Synopsis [32] J. Sun, W. Zhang, X. Tang, and H. Shum, “Background Cut,” Proc. Ninth European Conf. Computer Vision, pp. 628-641, 2006. Fig. 7. Four extracted tubes shown “flattened” over the corresponding backgrounds from Fig. 6. The left tubes correspond to ground vehicles, while the right tubes correspond to airplanes on the runway at the back.

  16. Object-based Synopsis • Create a synopsis having maximum activity while avoiding collision between objects. • Optimal synopsis video as the one that minimizes the following energy function:

  17. Object-based Synopsis • Activity cost : penalize for objects that are not mapped to a valid time in the synopsis. • Collision cost : for every two shifted tubes, define the collision cost as the volume of their space-time overlap weighted by their activity measures. • Reducing the weights of collision cost will result in a denser video where object may overlap. • Increasing this weight will result in a sparser video where objects do not overlap and less activity is presented.

  18. Object-based Synopsis • Temporal consistency cost : preserving the chronological order of events. • The amount of interaction d(b, b’) between each pair of tubes is estimated from their relative spatiotemporal distance. • d(b,b’,t) : euclidean distance • : the extent of the space interaction between tubes. • If tube b and b’ do not share a common time at the synopsis video. • : extent of time in which events still have temporal interaction.

  19. Object-based Synopsis • Energy minimization • Use simple greedy optimization. • The optimization was applied in the space of all possible temporal mappings M. • Initial state, use the state in which all tubes are shifted to the beginning of the synopsis video. • In order to accelerate computation, restrict the temporal shifts of tubes to be in jumps of 10 frames.

  20. Object-based Synopsis • Stroboscopic panoramic synopsis • Long tubes exist in the input video, the duration of the synopsis video is bounded.

  21. Object-based Synopsis • Surveillance application Fig. 12. Video synopsis from street surveillance. (a) A typical frame from the original video (22 seconds). (b) A frame from a video synopsis movie(2 seconds) showing condensed activity. (c) A frame from a shorter video synopsis (0.7 second) showing even more condensed activity.

  22. Synopsis of Endless Video • Make the webcam resource more useful. Build a system which is based on the object-based synopsis allows dealing with endless videos. • Query to system • For example: I would like to watch in one minute a synopsis of the video from this camera captured during the last hour. • Respond to query • The most interesting events(tubes) are collected from the desired period and are assembled into a synopsis video of the desired length. • The synopsis video is an index into the original video as each object includes a pointer to its original time.

  23. Synopsis of Endless Video

  24. Synopsis of Endless Video • Removing stationary frames • Filter out frames with no activity during online phase. • Record frames according to two criteria • A global change in the scene, measured by SSD between the incoming frame and the last kept frame. For lighting change. • The existence of a moving object measured by the maximal SSD in small windows. • By assuming that moving objects with a very small duration are not important(e.g. less than a second). Video activity can be measured only once in every 10 frames.

  25. Synopsis of Endless Video • The object queue • Main challenge is handling endless videos. • The naive scheme is to throw out the oldest activity.  Not good • Estimate the importance of each object to possible future queries and throw objects out accordingly. • Importance (activity) • Collision potential (spatial activity distribution) • Age • Other options like specify activity is of interest. Fig. 14. The spatial distribution of activity in the airport scene (intensity is log of activity value). The activity distribution of a single tube is on the left and the average over all tubes is on the right. As expected, the highest activity is on the car lanes and on the runway. The potential for the collision of tubes is higher in regions having a higher activity.

  26. Synopsis of Endless Video • Synopsis generation • Generate a background video. • A consistency cost is computed for each object and for each possible time in the synopsis. • An energy minimization determines which tubes appear in the synopsis and at what time. • The selected tubes are combined with the background.

  27. Synopsis of Endless Video • Time-lapse background • Represent the background changes over time. • Represent the background of the activity tubes. • Constructing two temporal histograms: • A temporal activity histogram Ha of the video stream. • A uniform temporal histogram Ht of the video stream. • Compute a third histogram by interpolating the two histograms

  28. Synopsis of Endless Video • Consistency with background • Prefer to stitch tubes to background images having a similar appearance. • Final energy function:

  29. Synopsis of Endless Video • Stitching the synopsis video • Use the modification of Poisson editing to deal with objects coming from different lighting condition. • Overlapping tubes are blended together by letting each pixel be a weighted average of the corresponding pixels from the stitched activity tubes, with weights proportional to the activity measures. [20] M. Gangnet, P. Perez, and A. Blake, “Poisson Image Editing,” Proc. ACM SIGGRAPH ’03, pp. 313-318, July 2003.

  30. Limitations and failures • Video synopsis is less applicable in several cases, some of which are listed below: • Video with already dense activity. All locations are active all the time. An example is a camera in a busy train station. • Edited video, like a feature movie. The intentions of the movie creator may be destroyed by changing the chronological order of events.

  31. Demo • http://www.vision.huji.ac.il/video-synopsis/

More Related