Semantic Content-based Access To Hypervideo Databases

Semantic Content-based Access To Hypervideo Databases Haitao Jiang Major Professor: Ahmed K. Elmagarmid Computer Science Department Purdue University 1998

Organization Of The Talk • Introduction And Review Of Related Work • Logical Hypervideo Data Model (LHVDM) • Semantic Content-based Video Queries • A Web-based Logical Hypervideo Database (WLHVDB) • Conclusion

Introduction • Digital Video And Video Databases • Basic Research Problems • Research Motivation • Research Goal

Unique Characteristics Of Video Data • Semantics: rich and ambiguous • Relationship: ill-defined • Structure: unclear • Dimension: spatial and temporal • Volume: huge

Video Data Content • Visual Content • Audio Content • Text Content • Semantics Content

Research Problems • Video Data Modeling • Video Data Indexing • Video Data Query • Video Browsing

Video Data Model Requirements • Content-based Data Access • Video Data Abstraction • Variable Data Access Granularity • Dynamic And Incremental Video Annotation

Video Data Model Requirements (Con.) • Video Data Independence • Spatial And Temporal Characteristics • Video And Meta-data Sharing And Reuse

Related Work • Video Data Modeling, Indexing, And Querying • Video Objects • Video Browsing

Video Data Modeling, Indexing, and Querying • Traditional Database Approach • Visual Content Or Segmentation-based Approach • Stratification Or Annotation Layering Approach

Traditional Database Approach • Categorize And Predefine Video Data Attributes/Values • Use Traditional Databases And SQL • Inflexible And Limited • Examples: VISION, Video Database Browser

Segmentation-based Models • Parse And Segment Video Streams • Index On Visual Features Of RFrames • Extract High Level Logical Structure And Semantics By Classifying Against Domain Models

Segmentation-based Models (con.) • Can Be Fully Automated • Lack Of Flexibility • Limited Semantics • Video Streams Need To Be Well-structured • Examples: JACOB, QBIC, Informedia

Stratification • Segment Video Semantics • Concept Of Logical Video Data • Allows For Semantic Content-based Video Access • Annotation Can Be Tedious And Biased • Examples: VideoStar, Algebraic Video

Stratification(con.) Existing Models • Has Limited Temporal Queries • Has Limited Video Browsing Mechanism • Lack Multi-user Views And Data Sharing • Lack Modeling Of Video Objects • Lack Spatial And Spatial-Temporal Query Capabilities

Different Forms Of Video Annotation • Multi-layer Icons - MediaStream • Keywords • Free Text Documents • Other Types Of Annotation?

Sources Of Video Annotations • Closed Caption • Text In Video Frames: highlight detection and OCR • Voice Recognition • Manual Annotation

Annotation Support In A Video Data Model • Annotation of Arbitrary Sequence • Incremental Creation, Deletion, And Modification • Multi-user Annotation Sharing • Arbitrary Overlap Of Annotations

Video Objects • Index On Spatial And Temporal Information • MBR as the Spatial Representation • Narrow Focus And Lack Of Data Abstraction • Limited Video Queries • Example: AVIS, CVOT

Video Browsing • Visual Content-based Browsing • Film Strips • Salient Images • Scene Clustering Graph • Need Semantic Content-based Browsing • Need Inter-Video Navigation

Research Motivations • Visual Content-based Video Access IS Important BUT Lack Semantics • Users Often Prefer Semantic Content-based Video Data Access • Lots Applications: Digital Video Library And Distance Learning etc. • Web Is An Emerging Way Of Information Sharing

Research Goal • Goal: To Provide Effective And Flexible Semantic Content-based Video Data Access In A Distributed and Multi-user Sharing Environment • Both Spatial And Temporal Video Queries • Heterogeneous Applications And User Views • Semantic Content-based Browsing

JACOB Project Ardizzone and Cascia et al. 1997 • Visual Content-based Access To Images And Videos • RFrames Are Extracted And Served As Descriptors Of Video Segments • Index On Visual Features (Color, Motion, And Texture etc.)

Informedia Project M. A. Smith, T. Kanade, M. G. Christel, D. B. Winkler et al. CMU • Video Abstraction: Title-Poster Frame-Film strip-Skim video • Speech Recognition->Transcript->Natural Language Processing->Keywords->Align to Frames • Face And Keyword Search

VISION Digital Library K. M. Pua, S. Gauch et al. University of Kansas, 1993 - 1994 • Practical And Cost-effective Implementation But Very Limited • Video Storage System + IR system (Illustra - An ORDBMS) • Text Is As One Table Entry Of Video Data • Support Boolean Operators

OVID System Oomota and Tanaka, 1991 • Video Object: a set arbitrary frame sequences with attributes and values • Video Object Model Is Schemaless • Data Description Sharing Via “Interval-inclusion Based Inheritance” • User Can Decide which Attributes To Be Shared

OVID System (con.) • Video-Object Composition: merge, interval projection and overlap • VideoSQL • SELECT: continuous/Incontinuous/anyObject • WHERE: attribute is [value] / attribute contains [value] / defineOver [frames] • Browsing: VideoChart - bar chart representation of video objects

Virtual Video Browser Little et al., 1993 • Predefined Schema With Fixed Attributes • Descriptions Can Not be Overlapped or Nested • Target at MOD: not suitable for dynamic creation, modification of video • No Personalized View • No Spatio-temporal Queries

Video Database Browser System Rowe, Boreczky et al. 1994 • Classify Metadata Into: Bibliographic, Structural, And Content Data • Use Relational Database Schema (POSTGRES RDBMS) • Support Video Queries On Predefined Attributes

Video Stratification Smith and Davenport, MIT, 1991 - 1992 • Associate Description To A Sequence Of Video Frames • Simple Keyword Search • Strata May Overlap • Relation Among Strata Is Absent

BRAHMA Dan et al., IBM T. J. Watson, 1996 • Browsing and Retrieval Architecture for Hierarchical Multimedia Annotations • Each Annotation Node is an Attribute / Value Pair • Nodes Can Be Dynamically Created and Shared by Multi-users

Media Streams Davis 1993 • Goal: overcome keyword annotation weaknesses • Iconic Video Content Annotation • Hierarchical: general -> specific • Represent And Match Temporal Relations • Fixed Vocabulary • Doesn’t Address Textual Data, e.g. Closed Caption

Algebraic Video System Weiss et al, MIT, 1995 • Goal: Temporal Video Composition • Basic Approach: Stratification

Algebraic Video Data Model • Video Expression: • multi-window, spatial, temporal and content combination of raw video segments • recursively constructed using video algebraic operators • Video Algebraic Operators: creation, composition, output, and description

Algebraic Video Data Model • Providing Multiple Coexisting Views (Nest Stratification) • Video Query: Boolean combination of attributes • Temporal Constraint Is Expressed As Attribute Values • Video Browsing Within The Expression

VideoSTAR (STorage And Retrieval) System Hjelsvold et al,, 1995 • Goal: Multi-user Video Information Sharing • Basic Approach: Stratification

VideoSTAR: Generic Video Data Model • Continuous Media Objects (CMObjects) • MediaStream: • Virtual Video Streams (VideoStreams) • Video/Audio Recordings (StoredMediaSegments) • An Arbitrary StreamInterval can be annotated

VideoSTAR: Video Querying and Browsing • Three Kinds of Video Context: • Basic, Secondary, and Primary • Unconditionally context sharing • VideoSTAR Query Algebra • Boolean, Set , and Temporal Operators • Based on Attribute/Value • Users Need to Choose Query Context

VideoSTAR: Video Querying and Browsing Two Browsing Operators • Retrieve All Annotations Over a Video Stream or Interval • Retrieve All Structures Defined Over a Interval

Advanced Video Information System (AVIS) Adah, Candan, Chen, Erol, and Subrahamanian, University of Maryland. MSJ 1996 • Basic Approach: spatial Indexes + RDB • Entities: things that are interesting which may or may not actually appear in the movie, including video objects, activity types, event (roles and teams) • Raw Video Frame Sequences

Advanced Video Information System (AVIS) • Associate Map: entities <--> frame sequences. • Index:frame segment tree + OBJECTARRAY + EVENTARRAY + ACTIVITYARRAY • All Clips Must Be Equal Length With No Overlap • No Spatial and Temporal Queries • No Logical Video Abstractions

Common Video Object Model (CVOT) J. Li and T. Ozsu et al. University of Alberta, 1998 • Focus On Salient Objects And Based On OODBMS • CVO Tree: each leaf is a video interval with salient objects (similar to AVIS) attached • Video Clips Can Be Overlapped To Model Special Editing Effects (Fade In etc.)

Common Video Object Model (CVOT) • Query Language: MOQL • based on OQL proposed by ODMG for ODBMSs • has both temporal and spatial operators • Symbolic Trajectory Representation And Matching • Logical v.s. Physical Salient Objects • Only Address Salient Objects

Video Browsing • Representation Frames (RFrames) • Sport Highlight [Yow95] • Caption Detection [Smith95, Yeo96] • Keyword Spotting [Smith95] • Explicit Models (News Video) [Swanberg93, Zhang94]

Video Browsing (con.) • Shot Clustering Based On Visual Similarity and Temporal Locality[Yeung95, Rui98] • Scene Change Graph (CTG) [Yeung95] Video->Shot Segmentation->Shot Clustering->Scene Segmentation

Logical Hypervideo Data Model (LHVDM) • Definition • Hierarchical Video Abstractions • Hot Video Object Modeling • Video Indexing • Video Semantic Association And Hypervideo • A Generic Video Database Architecture

Logical Hypervideo Data Model (con.) (PV, PVS, LV, LVS, HO, CD, LINKS, UV, MAP) PV: Set Of Physical Video Streams PVS: Set Of Physical Video Segments LV: Set Of Logical Video Streams LVS: Set Of Logical Video Segments HO: Set Of Hot Objects CD: Set Of Content Descriptions LINK: Set Of Video Hyperlinks UV: Set Of User Views MAP: Set Of Mapping Relations

Logical Hypervideo Data Model (con.) MAP includes PV <--> PVS: Easy Data Manipulation PVS <--> LV: Data Independence And Data Reuse LV <--> LVS: Multi-user View LV,LVS<-->HO:Effective Query LV,LVS,HO,CD<-->UV: Multi-user View Sharing LV,LVS,HO,LINKS<-->CD: Semantic Content-based Access Video Hyperlinks: Effective Video Browsing

Hierarchical Video Abstractions User Views (UVs) Logical Hypervideo Data Model (LHVDM) Hot Objects (HOs) Video Hyperlinks Logical Video Segments (LVSs) Logical Video Streams (LVs) Physical Video Segments (PVSs) Physical Video Streams (PVs)

Hot Video Objects • What Is A Hot Video Object • A Logical Video Abstraction • A Sub-Frame Region That Is “Hot” In A Set Of Logical Frame Sequence • Why Call Them “Hot” Object? • Target Of Interest • Hyperlink Property (Hot Video Spot)

Semantic Content-based Access To Hypervideo Databases

Semantic Content-based Access To Hypervideo Databases

Presentation Transcript

Hypervideo

Government Access To Private Databases

Content-Based Retrieval in Image Databases

Increasing Access to Content

REMOTE ACCESS TO DATABASES

Introduction to Access and Databases

Applying Semantic Analyses to Content-based Recommendation and Document Clustering

Intelligent Semantic Access to Audiovisual Content

Hypervideo

Semantic Access to Existing Archives

Modeling Query-Based Access to Text Databases

Semantic Content based Modeling

REMOTE ACCESS TO DATABASES

Ontology-based Access to Legacy Databases

Semantic Information Access

Semantic Content-based Access To Hypervideo Databases

Integrating Access to Digital Content

Access to Audiovisual Content

Semantic Access: Semantic Interface for Querying Databases

Microsoft access databases

Modeling Query-Based Access to Text Databases

Content-Based Multimedia Access