370 likes | 496 Views
Temporal Video Boundaries -Part One-. SNUEE Kim KyungMin. Why do we need temporal segmentation of v ideos? How do we set up boundaries in between video f rames? How do we merge two separate but uniform segments?. Abstract. Much work has been done in automatic
E N D
Temporal Video Boundaries-Part One- SNUEE Kim KyungMin
Why do we need temporal segmentation of • videos? • How do we set up boundaries in between video • frames? • How do we merge two separate but uniform • segments?
Abstract • Much work has been done in automatic • video analysis. But while techniques like • local video segmentation, object detection • and genre classificationhave been • developed, little work has been done on • retrieving overall structural properties of a • video content.
Abstract(2) • Retrieving overall structure in a video content • means splitting the video into meaningful tokens • by setting boundaries within the video. =>Temporal Video Boundary Segmentation • We define these boundaries into 3 categories : micro-, macro-, mega- boundaries.
Abstract(3) • Our goal is to have a system for automatic video • analysis, which should eventually work for • applications where a complete metadatais • unavailable.
Introduction • What’s going on? • Great increase in quantity of video contents. • More demand for content-aware apps. • Still the majority of video contents have insufficient metadata. => More demand for information on temporal video boundaries.
BOUNDARIES : definitions • Micro-boundaries : the shortest observable temporal segments. Usually bounded within a sequence of contiguously shot video frames. (frames under the same micro-boundaries.)
Micro-boundaries are associated to the smallest video units, for which a given attributeis constant or slowly varying. The attribute can be visual, sound or text. • Depending on which attribute, micro-boundaries can differ.
BOUNDARIES : definitions(2) • Macro-boundaries : boundaries between different parts of the narrative or the segments of a video content. (frames under the same macro-boundaries.)
Macro-boundaries are boundaries between micro-boundaries that are clearly identifiable organic parts of an event defining a structural or thematic unit.
BOUNDARIES : definitions(3) • Mega-Boundaries : • a boundary • between a • program and any • non-program • material. (frames under different mega-boundaries.)
Mega-Boundaries are boundaries between macro-boundaries which typically exhibit a structural and feature consistency.
BOUNDARIES : FORMAL Definition • A video content contains three types of • modalities : visual, audio, textual • andeach modality has three levels : low-, mid, • high- • These levels describe the “amount of details” • in each modality in terms of granularity and • abstraction.
BOUNDARIES : FORMAL Definition(2) • For each modality and levels is an attribute. An • attribute defined as below. (attribute vector) : denotes modality( ex : m=1, 2 and 3 means visual, audio and text respectively. : denotes the index for the attributes. (ex : m=1 and =1 indexes color ) : denotes the total number of vector components. : time constant ( can be expressed in integers or milliseconds.)
BOUNDARIES : FORMAL Definition(3) • If time interval is defined as , the average and • thedeviationof an attribute throughout the • video can be expressed as below : = avg of (deviation) = Where
BOUNDARIES : FORMAL Definition(4) • By using the vectors defined previously, we now have • two different methods to estimate temporal boundaries :
Micro-boundaries • In multi-media, the term “shot” or “take” is widely used. • Similar concept can be used to define the segment • between micro-boundaries, which is often called a • “family of frames.” • Each segment has an representative frame called • “keyframe.” The keyframe of a family has audio/video • data that well represents the segment. But the method • to pick out the keyframe may vary.
Micro-boundaries(2) • Each family has a “family histogram” to eventually form a • “superhistogram.” • A family histogram is a data structure that represents • the color information of a family of frames. • A superhistogramis a data structure that contains the • information about non-contiguous family histograms • within the larger video segment.
Micro-boundaries(3) • Generation of family histograms and superhistograms • may vary depending on pre-defined dimensions below. • 1) The amount of memory • -No memory means comparing only with the pre- • vious frame. • 2) Contiguity of compared families • -Determining the time step. • 3) Representation for a family • -How we choose the keyframe.
Micro-boundaries : Family of frames • An image histogram is a vector representing the color values and the frequency of their occurrence in the image. • Finding the difference between consecutive histograms and merging similar histograms enable generating family of frames. • For each frame, we compute the histogram( ) and then search the previously computed family histograms( ) to find the closest match.
Micro-boundaries : Family of frames(2) • Several ways to generate histogram difference : • Among them, the L1 and bin-wise histogram intersection gave the best results.
Micro-boundaries : boundary detection • If the difference between two family histograms is less than a given threshold, the current histogram is merged into the family histogram. • Each family histogram consists of : • 1) pointers to each of the constituent histograms and frame numbers. • 2) a merged family histogram.
Micro-boundaries : boundary detection(2) • Merging of family histograms is performed as below: • (basically, the mean of all histograms in the given video.)
Micro-boundaries : boundary detection(3) • Multiple ways to compare and merge families, depends on the choice of contiguity and memory. • Contiguous with zero memory • Contiguous with limited memory • Non-contiguous with unlimited memory • Hybrid : first a new frame histogram is compared using the contiguous frames and then the generated family histograms are merged using the non-contiguous case.
Micro-boundaries : experiments • CNN News Sample. • 27,000 frames • Tested with 9, 30, 90, 300 bins in HSB, 512 bins in RGB • Multiple histogram comparisons: L1, L2, bin-wise intersection and histogram intersection. • Tried on 100 threshold values.
Micro-boundaries : experiments(2) • Tested on a video clip, best results showed when threshold 10 with the L1 comparison/contiguous with limited memory boundary method/HSB space quantized to 9 bins.
Macro-boundaries • A story is a complete narrative structure, conveying a continuous thought or event. We want micro-segments with the same story to be in the same macro-segment. • Usually we need textual cues(transcripts) for setting such boundaries, but this paper suggests methodologies that does the job solely with audio and visual cues. • We focus on the observation that stories are characterized by multiple constant or slowly varying multimedia attributes.
Macro-boundaries(2) • Two types of uniform segment detection : • Unimodal and multimodal • Unimodal(under the same modality) : when a video segment exhibits the “same” characteristic over a period of time using a single type of modality. • Multimodal : vice versa
Macro-boundaries : single modality segmentation • In case of audio-based segmentation: • 1)Partition a continuous audio stream into non-overlapping segments. • 2) Classify the segments using low-level audio features like bandwidth. • 3) Divide the audio signal into portions of different classes.(speech, music, noise etc.)
Macro-boundaries : single modality segmentation(2) • In case of textual-based segmentation : • 1) If transcript doesn’t exist, extract text data from the audio stream using speech-to-text conversion. • 2) The transcript segmented with respect to a predefined topic list. • 3) A frequency-of-word-occurrence metric is used to compare incoming stories with the profiles of manually pre-categorized stories.
Macro-boundaries : multimodal segments • What we want to do : Retrieve better segmentation resultsby using the results from various unimodal segmentations. • What we need to do : first the pre-merging steps, and then the descent steps.
Macro-boundaries : multimodal segments(2) • Pre-merging Steps : detect micro-segments that exhibit uniform properties, and determine attribute templates for further segmentation. • Uniform segment detection • Intra-modal segment clustering • Attribute template determination -attribute template : a combination of numbers that characterize the attribute. • Dominant attribute determination • Template application
Macro-boundaries : multimodal segments(3) • Descent Methods : By making combinations of multimedia segments across multiple modalities, each attribute with its segments of uniform values is associated with a line.
Macro-boundaries : multimodal segments(4) • Single descent methoddescribes the process of generating story segments by combining these segments. • Single descent with intersecting union • Single descent with intersection • Single descent with secondary attribute • Single descent with conditional union
Macro-boundaries : experiments • Single descent process with conditional union. • Used text transcript as the dominant attribute. • -uniform visual/audio segments • -uniform audio segments • You can find a lag between the story beginning and the production of transcript.