Video Scene Segmentation Via Continuous Video Coherence

Video Scene Segmentation Via Continuous Video Coherence

Presentation Transcript

    Shufang Wu http://www.sfu.ca/~vswu vswu@cs.sfu.ca Wednesday, July 3, 2002

    3. Overview (2-1)

    4. A Method for Measuring Scene Boundaries By calculating a short term memory-based model of shot-to-shot “coherence” A One-pass On-the-fly Shot Clustering Algorithm Application of the Above to the “Theme” Theme: next higher level of video structure Overview (2-2)

    6. Frame Dissimilarity Normalized color histogram difference is adopted Measure of dissimilarity, or distance Shot Dissimilarity Minimum dissimilarity between any two frames of two shots Prior Work (3-1)

    7. Scene Detection ( A discrete, graph-based algorithm) Shots are clustered into sets (probable single camera positions) via a cluster similarity threshold d Scene Transition Graph (forward temporal transitions) A scene boundary is defined to be where: the graph is particularly thin, that is, at a cut edge. Problem Scenes are often reused (graph becoming cyclic) Improvement (Imposing a constraint on clustering) If shots are temporally too far apart, never merge Temporal threshold, T Prior Work (3-2)

    8. Limitations The approach is discrete and even binary The algorithm is sensitive to both cluster definition (via d) and scene definition (via T) The algorithm is expensive The discrete definition of scene is sensitive to small changes in shot similarity, and cannot represent or accommodate any ambiguity of parse. Prior Work (3-3)

    10. Problem Addressing Retain the definitions of frame & shot dissimilarity Replace the discrete algorithm by a continuous measure Modeling of the Recall of a Single Shot Short term visual memory buffer of frame perception Having a limited capacity (buffer size B) Preserving the order of visual stimulus Losing older frames throughout the buffer the same rate as perceived The likelihood of a frame remaining in buffer at time t Amount of frames of a shot of length T remaining in buffer when the final frame enters the buffer Shot recall is defined to be proportional to their shot similarity Continuous Measure of Coherence (4-1)

    15. Diameter of a Cluster The maximum dissimilarity between any two shots within it Shot Clustering Algorithm From each existing cluster, Cj, select the shot that is maximally dissimilar from the incoming shot, Si The cluster that would enlarge least is chosen (Cadd) If the increase in diameter is less than a threshold, Si is added to Cadd, otherwise a new cluster is begun. Threshold: Mean Shot Dissimilarity Works reasonably well and is self-adjusting

    16. Segmentation Plus Clustering Coherence and clustering algorithms can be combined to produce output that looks something like a musical score Example Typical behavior Small number of clusters in a scene (median value is 4) The relative density of their use

    18. Scene Dissimilarity Scene is represented by a cluster histogram Shots in a cluster were all the same “color” Dissimilarity: sum of absolute differences of the bins after the two cluster histograms are truncated and normalized

    19. Thematic Results A new theme is started if the incoming scene is sufficiently different from any existing ones Threshold: mean scene-to-scene dissimilarity plus three standard deviations Example

    20. References J. R. Kender and B. L. Yeo, “Video Scene Segmentation Via Continuous Video Coherence,” CVPR '98, pp. 367-373, June 1998. A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound, MIT Press, 1990. M. Yeung and B. L. Yeo, “Time-constrained clustering for segmentation of video into story units, ” International Conference on Pattern Recognition (ICPR’96), Vol. C, pp. 375-380, 1996.

