1.04k likes | 1.24k Views
Semantic Content-based Access To Hypervideo Databases. Haitao Jiang Major Professor: Ahmed K. Elmagarmid Computer Science Department Purdue University 1998. Organization Of The Talk. Introduction And Review Of Related Work Logical Hypervideo Data Model (LHVDM)
E N D
Semantic Content-based Access To Hypervideo Databases Haitao Jiang Major Professor: Ahmed K. Elmagarmid Computer Science Department Purdue University 1998
Organization Of The Talk • Introduction And Review Of Related Work • Logical Hypervideo Data Model (LHVDM) • Semantic Content-based Video Queries • A Web-based Logical Hypervideo Database (WLHVDB) • Conclusion
Introduction • Digital Video And Video Databases • Basic Research Problems • Research Motivation • Research Goal
Unique Characteristics Of Video Data • Semantics: rich and ambiguous • Relationship: ill-defined • Structure: unclear • Dimension: spatial and temporal • Volume: huge
Video Data Content • Visual Content • Audio Content • Text Content • Semantics Content
Research Problems • Video Data Modeling • Video Data Indexing • Video Data Query • Video Browsing
Video Data Model Requirements • Content-based Data Access • Video Data Abstraction • Variable Data Access Granularity • Dynamic And Incremental Video Annotation
Video Data Model Requirements (Con.) • Video Data Independence • Spatial And Temporal Characteristics • Video And Meta-data Sharing And Reuse
Related Work • Video Data Modeling, Indexing, And Querying • Video Objects • Video Browsing
Video Data Modeling, Indexing, and Querying • Traditional Database Approach • Visual Content Or Segmentation-based Approach • Stratification Or Annotation Layering Approach
Traditional Database Approach • Categorize And Predefine Video Data Attributes/Values • Use Traditional Databases And SQL • Inflexible And Limited • Examples: VISION, Video Database Browser
Segmentation-based Models • Parse And Segment Video Streams • Index On Visual Features Of RFrames • Extract High Level Logical Structure And Semantics By Classifying Against Domain Models
Segmentation-based Models (con.) • Can Be Fully Automated • Lack Of Flexibility • Limited Semantics • Video Streams Need To Be Well-structured • Examples: JACOB, QBIC, Informedia
Stratification • Segment Video Semantics • Concept Of Logical Video Data • Allows For Semantic Content-based Video Access • Annotation Can Be Tedious And Biased • Examples: VideoStar, Algebraic Video
Stratification(con.) Existing Models • Has Limited Temporal Queries • Has Limited Video Browsing Mechanism • Lack Multi-user Views And Data Sharing • Lack Modeling Of Video Objects • Lack Spatial And Spatial-Temporal Query Capabilities
Different Forms Of Video Annotation • Multi-layer Icons - MediaStream • Keywords • Free Text Documents • Other Types Of Annotation?
Sources Of Video Annotations • Closed Caption • Text In Video Frames: highlight detection and OCR • Voice Recognition • Manual Annotation
Annotation Support In A Video Data Model • Annotation of Arbitrary Sequence • Incremental Creation, Deletion, And Modification • Multi-user Annotation Sharing • Arbitrary Overlap Of Annotations
Video Objects • Index On Spatial And Temporal Information • MBR as the Spatial Representation • Narrow Focus And Lack Of Data Abstraction • Limited Video Queries • Example: AVIS, CVOT
Video Browsing • Visual Content-based Browsing • Film Strips • Salient Images • Scene Clustering Graph • Need Semantic Content-based Browsing • Need Inter-Video Navigation
Research Motivations • Visual Content-based Video Access IS Important BUT Lack Semantics • Users Often Prefer Semantic Content-based Video Data Access • Lots Applications: Digital Video Library And Distance Learning etc. • Web Is An Emerging Way Of Information Sharing
Research Goal • Goal: To Provide Effective And Flexible Semantic Content-based Video Data Access In A Distributed and Multi-user Sharing Environment • Both Spatial And Temporal Video Queries • Heterogeneous Applications And User Views • Semantic Content-based Browsing
JACOB Project Ardizzone and Cascia et al. 1997 • Visual Content-based Access To Images And Videos • RFrames Are Extracted And Served As Descriptors Of Video Segments • Index On Visual Features (Color, Motion, And Texture etc.)
Informedia Project M. A. Smith, T. Kanade, M. G. Christel, D. B. Winkler et al. CMU • Video Abstraction: Title-Poster Frame-Film strip-Skim video • Speech Recognition->Transcript->Natural Language Processing->Keywords->Align to Frames • Face And Keyword Search
VISION Digital Library K. M. Pua, S. Gauch et al. University of Kansas, 1993 - 1994 • Practical And Cost-effective Implementation But Very Limited • Video Storage System + IR system (Illustra - An ORDBMS) • Text Is As One Table Entry Of Video Data • Support Boolean Operators
OVID System Oomota and Tanaka, 1991 • Video Object: a set arbitrary frame sequences with attributes and values • Video Object Model Is Schemaless • Data Description Sharing Via “Interval-inclusion Based Inheritance” • User Can Decide which Attributes To Be Shared
OVID System (con.) • Video-Object Composition: merge, interval projection and overlap • VideoSQL • SELECT: continuous/Incontinuous/anyObject • WHERE: attribute is [value] / attribute contains [value] / defineOver [frames] • Browsing: VideoChart - bar chart representation of video objects
Virtual Video Browser Little et al., 1993 • Predefined Schema With Fixed Attributes • Descriptions Can Not be Overlapped or Nested • Target at MOD: not suitable for dynamic creation, modification of video • No Personalized View • No Spatio-temporal Queries
Video Database Browser System Rowe, Boreczky et al. 1994 • Classify Metadata Into: Bibliographic, Structural, And Content Data • Use Relational Database Schema (POSTGRES RDBMS) • Support Video Queries On Predefined Attributes
Video Stratification Smith and Davenport, MIT, 1991 - 1992 • Associate Description To A Sequence Of Video Frames • Simple Keyword Search • Strata May Overlap • Relation Among Strata Is Absent
BRAHMA Dan et al., IBM T. J. Watson, 1996 • Browsing and Retrieval Architecture for Hierarchical Multimedia Annotations • Each Annotation Node is an Attribute / Value Pair • Nodes Can Be Dynamically Created and Shared by Multi-users
Media Streams Davis 1993 • Goal: overcome keyword annotation weaknesses • Iconic Video Content Annotation • Hierarchical: general -> specific • Represent And Match Temporal Relations • Fixed Vocabulary • Doesn’t Address Textual Data, e.g. Closed Caption
Algebraic Video System Weiss et al, MIT, 1995 • Goal: Temporal Video Composition • Basic Approach: Stratification
Algebraic Video Data Model • Video Expression: • multi-window, spatial, temporal and content combination of raw video segments • recursively constructed using video algebraic operators • Video Algebraic Operators: creation, composition, output, and description
Algebraic Video Data Model • Providing Multiple Coexisting Views (Nest Stratification) • Video Query: Boolean combination of attributes • Temporal Constraint Is Expressed As Attribute Values • Video Browsing Within The Expression
VideoSTAR (STorage And Retrieval) System Hjelsvold et al,, 1995 • Goal: Multi-user Video Information Sharing • Basic Approach: Stratification
VideoSTAR: Generic Video Data Model • Continuous Media Objects (CMObjects) • MediaStream: • Virtual Video Streams (VideoStreams) • Video/Audio Recordings (StoredMediaSegments) • An Arbitrary StreamInterval can be annotated
VideoSTAR: Video Querying and Browsing • Three Kinds of Video Context: • Basic, Secondary, and Primary • Unconditionally context sharing • VideoSTAR Query Algebra • Boolean, Set , and Temporal Operators • Based on Attribute/Value • Users Need to Choose Query Context
VideoSTAR: Video Querying and Browsing Two Browsing Operators • Retrieve All Annotations Over a Video Stream or Interval • Retrieve All Structures Defined Over a Interval
Advanced Video Information System (AVIS) Adah, Candan, Chen, Erol, and Subrahamanian, University of Maryland. MSJ 1996 • Basic Approach: spatial Indexes + RDB • Entities: things that are interesting which may or may not actually appear in the movie, including video objects, activity types, event (roles and teams) • Raw Video Frame Sequences
Advanced Video Information System (AVIS) • Associate Map: entities <--> frame sequences. • Index:frame segment tree + OBJECTARRAY + EVENTARRAY + ACTIVITYARRAY • All Clips Must Be Equal Length With No Overlap • No Spatial and Temporal Queries • No Logical Video Abstractions
Common Video Object Model (CVOT) J. Li and T. Ozsu et al. University of Alberta, 1998 • Focus On Salient Objects And Based On OODBMS • CVO Tree: each leaf is a video interval with salient objects (similar to AVIS) attached • Video Clips Can Be Overlapped To Model Special Editing Effects (Fade In etc.)
Common Video Object Model (CVOT) • Query Language: MOQL • based on OQL proposed by ODMG for ODBMSs • has both temporal and spatial operators • Symbolic Trajectory Representation And Matching • Logical v.s. Physical Salient Objects • Only Address Salient Objects
Video Browsing • Representation Frames (RFrames) • Sport Highlight [Yow95] • Caption Detection [Smith95, Yeo96] • Keyword Spotting [Smith95] • Explicit Models (News Video) [Swanberg93, Zhang94]
Video Browsing (con.) • Shot Clustering Based On Visual Similarity and Temporal Locality[Yeung95, Rui98] • Scene Change Graph (CTG) [Yeung95] Video->Shot Segmentation->Shot Clustering->Scene Segmentation
Logical Hypervideo Data Model (LHVDM) • Definition • Hierarchical Video Abstractions • Hot Video Object Modeling • Video Indexing • Video Semantic Association And Hypervideo • A Generic Video Database Architecture
Logical Hypervideo Data Model (con.) (PV, PVS, LV, LVS, HO, CD, LINKS, UV, MAP) PV: Set Of Physical Video Streams PVS: Set Of Physical Video Segments LV: Set Of Logical Video Streams LVS: Set Of Logical Video Segments HO: Set Of Hot Objects CD: Set Of Content Descriptions LINK: Set Of Video Hyperlinks UV: Set Of User Views MAP: Set Of Mapping Relations
Logical Hypervideo Data Model (con.) MAP includes PV <--> PVS: Easy Data Manipulation PVS <--> LV: Data Independence And Data Reuse LV <--> LVS: Multi-user View LV,LVS<-->HO:Effective Query LV,LVS,HO,CD<-->UV: Multi-user View Sharing LV,LVS,HO,LINKS<-->CD: Semantic Content-based Access Video Hyperlinks: Effective Video Browsing
Hierarchical Video Abstractions User Views (UVs) Logical Hypervideo Data Model (LHVDM) Hot Objects (HOs) Video Hyperlinks Logical Video Segments (LVSs) Logical Video Streams (LVs) Physical Video Segments (PVSs) Physical Video Streams (PVs)
Hot Video Objects • What Is A Hot Video Object • A Logical Video Abstraction • A Sub-Frame Region That Is “Hot” In A Set Of Logical Frame Sequence • Why Call Them “Hot” Object? • Target Of Interest • Hyperlink Property (Hot Video Spot)