160 likes | 171 Views
7. Video databases. Video data representations Video = time-ordered sequence of correlated images ( frames ) Video signal representations originate from TV technology; different standards in USA (NTSC) and Europe (PAL, SECAM) 25-30 frames/sec
E N D
7. Video databases Video data representations • Video = time-ordered sequence of correlated images (frames) • Video signal representations originate from TV technology;different standards in USA (NTSC) and Europe (PAL, SECAM) • 25-30 frames/sec • Interlaced presentation of even/odd rows to avoid flickering. • Frame size levels: 352 x 240, 768 x 576 (PAL), 720 x 576 (CCIR 601), 720 x 480 (NTSC), 1440 x 1152, 1920 x 1080 (HDTV) • Aspect ratios: 4:3, 16:9 (widescreen) • Color videos: Decomposition into luminance and chrominance. • Typical sampling rates for SD video: 720 samples per line for luminance,360 samples per line for chrominance signals. MMDB-7 J. Teuhola 2012
Video compression • Not just coding of a sequence of images (Motion-JPEG), because the subsequent images are correlated (temporal redundancy). • Motion compensation: blocks (e.g. 8 x 8 pixels) in a frame are predicted by blocks in a previously reconstructedframe. • Compression artifacts disturbing the human eye may be different from those in still images. • Different techniques for different application areas (tv, dvd/bd,internet, videoconferencing) • Important issues: • Speed of compression/decompression • Robustness (error sensitivity) • Most of the standards are based on DCT (Discrete Cosine Transform) • Typical compression ratios from 50:1 to 100:1;the decompressed video is almost indistinguishable from the original. MMDB-7 J. Teuhola 2012
Standardization of video compression ISO/IEC MPEG (Moving Pictures Experts Group) • Standard includes both video and audio compression. • Started 1988; steps: • MPEG-1: Rates up to 1.5 Mbits / sec (VHS quality) • MPEG-2: Rates up to 10 Mbits / sec (Digi-TV, DVD, HDTV) • MPEG-3: Planned but dropped (found to be unnecessary) • MPEG-4: Object-based (separation from scene, animation, 3D, face modelling, interactivity, etc.) ITU-T (International Telecommunication Union): • H.261: Low bit-rates (e.g. videoconferencing) • H.262 = MPEG-2 • H.263: Low bit-rates (improved) • H.264 = MPEG 4 / Part 10, high compression power MMDB-7 J. Teuhola 2012
Random access from compressed video • Broadcasting or accessing video from storage:It should be possible to start from (almost) any frame. • MPEG solution: Three kinds of frames: • I-frame: Coded without temporal correlation (prediction); • gives lowest compression gain. • P-frame: Motion-compensated prediction from the last (closest) I- or P-frame. • B-frame: Bidirectional prediction from the previous and/orthe next I- or P-frame; • highest compression gain • gets over sudden changes • errors do not propagate. • GOP = Group Of Pictures = smallest random-access unit, must be decodable independently (starts usually with an I-frame). MMDB-7 J. Teuhola 2012
Bidirectional prediction I I B B B B B B B B B P P Forward prediction Example of frame order in MPEG • Two orders of frames: • Display order • Bitstream order • Buffering is needed to convert from bitstream order into display order; a small delay is involved. • The predictor and predicted frame need not be adjacent. MMDB-7 J. Teuhola 2012
Organizing and querying content of a video database Questions to be answered: • Which aspects of videos are likely to be of interest? • How should these aspects be represented and stored? • What kind of query languages are suitable? • Is the content extraction process manual or automatic? Possible aspects of interest: • Animate objects (people, etc.) • Inanimate objects (houses, cars, etc.) • Activities and events (walking, driving, etc.) Properties of objects: • Frame-dependent: valid in a subset of frames. • Frame-independent: valid for the video as a whole. MMDB-7 J. Teuhola 2012
Query types from a video database (a) Retrieve a complete video by name (b) Find frame sequences (‘clips’; ’shots’) containing certain objects or activities. (c) Find all videos/sequences containing objects/activities with certain properties. (d) Given a frame sequence, find all objects (of a certain type) occurring in some or all of the frames of the segment. (e) Given a frame sequence, find all activities (of a certain type) occurring in it. NOTE: Video is a multimedia tool: images + audio + possible text. Audio channel can be extremely important in detecting events. Textual components (e.g. subtitles are invaluable keyword sources) MMDB-7 J. Teuhola 2012
Indexing of video content • Content descriptions are not usually built on a frame-by-frame basis, due to the high number of frames. • Compact representations are needed. • Concepts: • Frame sequence:A contiguous subset of frames (e.g. a ‘shot’) • Well-ordered set of frame sequences:Temporal order, no overlaps • Solid set of frame sequences:Well-ordered, non-empty gaps between sequences (‘scene’) • Frame sequence association map:For each object and activity, a solid set of frame sequences is attached, showing frames in which they appear. MMDB-7 J. Teuhola 2012
obj. 1 obj. 2 act. 1 frame no 1000 2000 3000 4000 5000 Frame segment tree • Binary tree • Special (1-dimensional) case of the spatial clipping approach. • Leaves represent basic intervals of the frame sequence: • Leaves are well ordered, and they cover the whole video. • Their endpoints include all endpoints of the sequences. • An internal node represents the concatenation of its children • The root represents the whole video. • Example of objects and activities: MMDB-7 J. Teuhola 2012
0- 5000 1 0- 3000 3000- 5000 2 3 0- 2000 2000- 3000 3000- 4000 4000- 5000 o2 o1 4 5 6 7 a1 o1 a1 o2 a1 9 10 11 a1 13 o2 14 o2 15 o1 8 12 500- 2000 2000- 2500 2500- 3000 3500- 4000 4000- 4500 4500- 5000 0- 500 3000- 3500 Frame segment tree: example Indexing: • Obj. 1 6, 9, 15 • Obj. 2 4, 10, 13, 14 • Act. 1 7, 9, 10, 12 Note: Actually the intervals are half-open, e.g. [0, 500) = 0..499 MMDB-7 J. Teuhola 2012
Indexing in the frame segment tree • For each object and activity record, there is a list of pointers to the nodes of the frame segment tree. • Objects and activities themselves may be indexed in traditional ways. • Each node of the frame segment tree points to a linked list of pointers to the objects and activities that appear throughout the whole segment that this node represents (but only partially in the parent segment). In the previous example: node 4 obj. 2, node 6 obj. 1 node 7 act. 1 node 9 obj.1, act. 1 node 10 obj. 2, act. 1 node 12 act. 1 node 13 obj. 2 node 14 act.2 node 15 obj. 1 • This can be generalized to a set of videos (common frame segment tree, combined object/activity set, extended pointers). MMDB-7 J. Teuhola 2012
Queries using a frame segment tree (a) Find segments where a given object/activity occurs(trivial; just follow the pointers.) (b) Find objects occurring between frames s and e:Walk the tree in preorder, denote the current node interval by I. • If I [s, e) = , then this subtree can be skipped. • If I [s, e), then walk through the whole subtree (including the current node) and report all its objects. • Otherwise report the objects and activities of the current node, and continue the search to both subtrees. (c) Find objects/activities occurring together with object x:Scan the segments where x occurs, and report the objects/activities occurring in these segments and their ancestors. MMDB-7 J. Teuhola 2012
R2 R1 obj. 1 obj. 2 R3 act. 1 1000 2000 3000 4000 5000 R-segment tree (RS-tree) • Special case of R-tree • Two possible implementations: (a) 1-dimensional space (dimension = time) (b) 2-dimensional space, where the other dimension is just enumeration of objects/activities (not a true spatial dimension): MMDB-7 J. Teuhola 2012
Computer-assisted video analysis Video segmentation: • Division of videos into homogeneous sequences. • Typical segments are often so called shots, filmed without interrupts • Segmentation = detection of shot boundaries • Sharp cuts are easier than gradual transitions (e.g. crossfade) • Features for automatic segmentation: • Similarity of color histograms of subsequent frames: simple and effective, but sensitive to varying illumination. • Edge features: similarity of shapes • Motion vectors: restricted vector lengths within a shot. • Corner points: similarity of landmark points in frames • The actual segmentation can be based on thresholds for similarity, but also machine learning techniques have been used widely. • Higher-level segmentation into scenes, called also story units. MMDB-7 J. Teuhola 2012
Computer-assisted video analysis (cont.) Keyframes: • Representative frames within shots, containing the essential elements for retrieval • Scene-level segmentation often uses keyframe features, and operates e.g. in top-down or bottom-up manner. Choosing keyframes: • Fuzzy task – no definite optimum • Can be based on the same features as segmentation • Various algoritmic approaches: • Sequential comparison • Clustering • Trajectory-based • Decision in the context of object/event detection MMDB-7 J. Teuhola 2012
Computer-assisted video analysis (cont.) Object recognition: • Keyframe-based recognition extracts the same features as for still images: color, texture, shape, but also objects and motion. • Motion compensation techniques can be used to find out the frame interval of the occurrence of the object. Annotations: • Allocation of semantic concepts to video segments • Means roughly the same as segment classification • Machine-learning tools have been attampted • Human assistance is usually needed in the final recognition, naming and classification of segments and detected objects within them. Ref: W. Hu, N. Xie, L. Li, X. Zend, and S. Maybank: ”A Survey of Visual Content-Based Video Indexing and Retrieval”, IEEE Trans. on Systems, Man, and Cybernetics 41(6), Nov. 2011. MMDB-7 J. Teuhola 2012