1 / 16

7. Video databases

7. Video databases. Video data representations Video = time-ordered sequence of correlated images ( frames ) Video signal representations originate from TV technology; different standards in USA (NTSC) and Europe (PAL, SECAM) 25-30 frames/sec

Download Presentation

7. Video databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 7. Video databases Video data representations • Video = time-ordered sequence of correlated images (frames) • Video signal representations originate from TV technology;different standards in USA (NTSC) and Europe (PAL, SECAM) • 25-30 frames/sec • Interlaced presentation of even/odd rows to avoid flickering. • Frame size levels: 352 x 240, 768 x 576 (PAL), 720 x 576 (CCIR 601), 720 x 480 (NTSC), 1440 x 1152, 1920 x 1080 (HDTV) • Aspect ratios: 4:3, 16:9 (widescreen) • Color videos: Decomposition into luminance and chrominance. • Typical sampling rates for SD video: 720 samples per line for luminance,360 samples per line for chrominance signals. MMDB-7 J. Teuhola 2012

  2. Video compression • Not just coding of a sequence of images (Motion-JPEG), because the subsequent images are correlated (temporal redundancy). • Motion compensation: blocks (e.g. 8 x 8 pixels) in a frame are predicted by blocks in a previously reconstructedframe. • Compression artifacts disturbing the human eye may be different from those in still images. • Different techniques for different application areas (tv, dvd/bd,internet, videoconferencing) • Important issues: • Speed of compression/decompression • Robustness (error sensitivity) • Most of the standards are based on DCT (Discrete Cosine Transform) • Typical compression ratios from 50:1 to 100:1;the decompressed video is almost indistinguishable from the original. MMDB-7 J. Teuhola 2012

  3. Standardization of video compression ISO/IEC MPEG (Moving Pictures Experts Group) • Standard includes both video and audio compression. • Started 1988; steps: • MPEG-1: Rates up to 1.5 Mbits / sec (VHS quality) • MPEG-2: Rates up to 10 Mbits / sec (Digi-TV, DVD, HDTV) • MPEG-3: Planned but dropped (found to be unnecessary) • MPEG-4: Object-based (separation from scene, animation, 3D, face modelling, interactivity, etc.) ITU-T (International Telecommunication Union): • H.261: Low bit-rates (e.g. videoconferencing) • H.262 = MPEG-2 • H.263: Low bit-rates (improved) • H.264 = MPEG 4 / Part 10, high compression power MMDB-7 J. Teuhola 2012

  4. Random access from compressed video • Broadcasting or accessing video from storage:It should be possible to start from (almost) any frame. • MPEG solution: Three kinds of frames: • I-frame: Coded without temporal correlation (prediction); • gives lowest compression gain. • P-frame: Motion-compensated prediction from the last (closest) I- or P-frame. • B-frame: Bidirectional prediction from the previous and/orthe next I- or P-frame; • highest compression gain • gets over sudden changes • errors do not propagate. • GOP = Group Of Pictures = smallest random-access unit, must be decodable independently (starts usually with an I-frame). MMDB-7 J. Teuhola 2012

  5. Bidirectional prediction I I B B B B B B B B B P P Forward prediction Example of frame order in MPEG • Two orders of frames: • Display order • Bitstream order • Buffering is needed to convert from bitstream order into display order; a small delay is involved. • The predictor and predicted frame need not be adjacent. MMDB-7 J. Teuhola 2012

  6. Organizing and querying content of a video database Questions to be answered: • Which aspects of videos are likely to be of interest? • How should these aspects be represented and stored? • What kind of query languages are suitable? • Is the content extraction process manual or automatic? Possible aspects of interest: • Animate objects (people, etc.) • Inanimate objects (houses, cars, etc.) • Activities and events (walking, driving, etc.) Properties of objects: • Frame-dependent: valid in a subset of frames. • Frame-independent: valid for the video as a whole. MMDB-7 J. Teuhola 2012

  7. Query types from a video database (a) Retrieve a complete video by name (b) Find frame sequences (‘clips’; ’shots’) containing certain objects or activities. (c) Find all videos/sequences containing objects/activities with certain properties. (d) Given a frame sequence, find all objects (of a certain type) occurring in some or all of the frames of the segment. (e) Given a frame sequence, find all activities (of a certain type) occurring in it. NOTE: Video is a multimedia tool: images + audio + possible text. Audio channel can be extremely important in detecting events. Textual components (e.g. subtitles are invaluable keyword sources) MMDB-7 J. Teuhola 2012

  8. Indexing of video content • Content descriptions are not usually built on a frame-by-frame basis, due to the high number of frames. • Compact representations are needed. • Concepts: • Frame sequence:A contiguous subset of frames (e.g. a ‘shot’) • Well-ordered set of frame sequences:Temporal order, no overlaps • Solid set of frame sequences:Well-ordered, non-empty gaps between sequences (‘scene’) • Frame sequence association map:For each object and activity, a solid set of frame sequences is attached, showing frames in which they appear. MMDB-7 J. Teuhola 2012

  9. obj. 1 obj. 2 act. 1 frame no 1000 2000 3000 4000 5000 Frame segment tree • Binary tree • Special (1-dimensional) case of the spatial clipping approach. • Leaves represent basic intervals of the frame sequence: • Leaves are well ordered, and they cover the whole video. • Their endpoints include all endpoints of the sequences. • An internal node represents the concatenation of its children • The root represents the whole video. • Example of objects and activities: MMDB-7 J. Teuhola 2012

  10. 0- 5000 1 0- 3000 3000- 5000 2 3 0- 2000 2000- 3000 3000- 4000 4000- 5000 o2 o1 4 5 6 7 a1 o1 a1 o2 a1 9 10 11 a1 13 o2 14 o2 15 o1 8 12 500- 2000 2000- 2500 2500- 3000 3500- 4000 4000- 4500 4500- 5000 0- 500 3000- 3500 Frame segment tree: example Indexing: • Obj. 1  6, 9, 15 • Obj. 2 4, 10, 13, 14 • Act. 1  7, 9, 10, 12 Note: Actually the intervals are half-open, e.g. [0, 500) = 0..499 MMDB-7 J. Teuhola 2012

  11. Indexing in the frame segment tree • For each object and activity record, there is a list of pointers to the nodes of the frame segment tree. • Objects and activities themselves may be indexed in traditional ways. • Each node of the frame segment tree points to a linked list of pointers to the objects and activities that appear throughout the whole segment that this node represents (but only partially in the parent segment). In the previous example: node 4  obj. 2, node 6  obj. 1 node 7  act. 1 node 9  obj.1, act. 1 node 10 obj. 2, act. 1 node 12 act. 1 node 13 obj. 2 node 14 act.2 node 15 obj. 1 • This can be generalized to a set of videos (common frame segment tree, combined object/activity set, extended pointers). MMDB-7 J. Teuhola 2012

  12. Queries using a frame segment tree (a) Find segments where a given object/activity occurs(trivial; just follow the pointers.) (b) Find objects occurring between frames s and e:Walk the tree in preorder, denote the current node interval by I. • If I [s, e) = , then this subtree can be skipped. • If I  [s, e), then walk through the whole subtree (including the current node) and report all its objects. • Otherwise report the objects and activities of the current node, and continue the search to both subtrees. (c) Find objects/activities occurring together with object x:Scan the segments where x occurs, and report the objects/activities occurring in these segments and their ancestors. MMDB-7 J. Teuhola 2012

  13. R2 R1 obj. 1 obj. 2 R3 act. 1 1000 2000 3000 4000 5000 R-segment tree (RS-tree) • Special case of R-tree • Two possible implementations: (a) 1-dimensional space (dimension = time) (b) 2-dimensional space, where the other dimension is just enumeration of objects/activities (not a true spatial dimension): MMDB-7 J. Teuhola 2012

  14. Computer-assisted video analysis Video segmentation: • Division of videos into homogeneous sequences. • Typical segments are often so called shots, filmed without interrupts • Segmentation = detection of shot boundaries • Sharp cuts are easier than gradual transitions (e.g. crossfade) • Features for automatic segmentation: • Similarity of color histograms of subsequent frames: simple and effective, but sensitive to varying illumination. • Edge features: similarity of shapes • Motion vectors: restricted vector lengths within a shot. • Corner points: similarity of landmark points in frames • The actual segmentation can be based on thresholds for similarity, but also machine learning techniques have been used widely. • Higher-level segmentation into scenes, called also story units. MMDB-7 J. Teuhola 2012

  15. Computer-assisted video analysis (cont.) Keyframes: • Representative frames within shots, containing the essential elements for retrieval • Scene-level segmentation often uses keyframe features, and operates e.g. in top-down or bottom-up manner. Choosing keyframes: • Fuzzy task – no definite optimum • Can be based on the same features as segmentation • Various algoritmic approaches: • Sequential comparison • Clustering • Trajectory-based • Decision in the context of object/event detection MMDB-7 J. Teuhola 2012

  16. Computer-assisted video analysis (cont.) Object recognition: • Keyframe-based recognition extracts the same features as for still images: color, texture, shape, but also objects and motion. • Motion compensation techniques can be used to find out the frame interval of the occurrence of the object. Annotations: • Allocation of semantic concepts to video segments • Means roughly the same as segment classification • Machine-learning tools have been attampted • Human assistance is usually needed in the final recognition, naming and classification of segments and detected objects within them. Ref: W. Hu, N. Xie, L. Li, X. Zend, and S. Maybank: ”A Survey of Visual Content-Based Video Indexing and Retrieval”, IEEE Trans. on Systems, Man, and Cybernetics 41(6), Nov. 2011. MMDB-7 J. Teuhola 2012

More Related