190 likes | 203 Views
This proposal discusses the challenges of handling and representing multimedia data in a distributed environment, and proposes a work outline to develop a distributed multimedia storage and management system capable of supporting popular retrieval applications like content-based image/video retrieval.
E N D
A Distributed Multimedia Data Management over the Grid Kasturi Chatterjee Advisors for this Project:Dr. Shu-Ching Chen & Dr. Masoud Sadjadi Distributed Multimedia Information System Laboratory School of Computing and Information Sciences Florida International University, Miami, FL 33199, USA
Outline • Motivation • Why multimedia data ? • Why handling and representing multimedia data challenging? • Why distributed environment ? • Why content based image/video retrieval ? • Multimedia data management • Representation • Storage and Indexing • Popular retrieval strategies • Proposed Work Outline • Issues to be addressed • Components and Related Work • Conclusion Global Cyberbridges 2008 Proposal
Motivation Why multimedia data ? • Attractive • Informative • Compact • Cheap memory makes storage easy Why handling and representing multimedia data challenging? • Huge size (a typical 10 sec MPEG video ~4M) • Temporal and Spatial Information • High-level meaning and the semantic gap • Multidimensional representation • Traditional database incapable of accommodating above characteristics Global Cyberbridges 2008 Proposal
Motivation Why distributed environment ? • Shared storage • Shared Resources • Shared computing power • No single point of failure Why content based image/video retrieval ? • unlike traditional data, temporal, spatial and semantic content should be considered during query of multimedia data Can queries be issued textually for image/video databases? MAY BE NOT! • Meta data • Keywords • In Google Images: sunset Query By Example, Similarity Measurement, Content Interpretation, User Feedback etc. to be considered Global Cyberbridges 2008 Proposal
Multimedia data management Representation • Multidimensional : Unlike traditional data which is uni-dimensional, multimedia data in the form of image or video is multidimensional. • Semantic Interpretation : Multimedia data can have varied semantic interpretation. • Feature Selection : Identifying feature space to represent the multimedia data is an important and crucial step in MDBMS. Features can be Color, Texture or Temporal information etc. The atypical nature of multimedia data needs special representation in the form of multidimensional feature vectors Global Cyberbridges 2008 Proposal
Multimedia data management Storage and Indexing • Indexing is an integral part of designing a database system to reduce computation overhead and optimize retrieval. Multimedia Data Indexing Requirements • Multimedia data stored as multidimensional feature vector. • Need to index a high dimensional feature space. • Index structure should map low level representation and high level semantic relationship. • Index structure should handle popular multimedia data retrieval strategies like content-based image retrieval (CBIR), relevance feedback (RF), video event retrievals etc. Existing multidimensional indexing strategies fail to fulfill the above requirements efficiently! Global Cyberbridges 2008 Proposal
Image Database Feature Descriptor Extraction Multimedia data management • Popular Retrieval Strategies (Content-Based Image/Video Retrieval) Retrieval Results Similarity Measurement Global Cyberbridges 2008 Proposal
Proposed Work Outline A typical Grid Architecture Source: http://gridcafe.web.cern.ch/gridcafe/gridatwork/architecture.html Global Cyberbridges 2008 Proposal
Proposed Work Outline Research Issues • Development of a technique to enable uniform representation of the multimedia data • Development of an efficient index structure, capable of handling multimedia data and support applications like CBIR/CBVR, spanning across multiple storages over a Grid/distributed environment • Devising a mechanism by which users’ similarity concept across multiple network domains can be considered during providing query results In short we envision to develop a distributed multimedia storage and management system which will be capable of supporting popular retrieval applications like CBIR/CBVR Global Cyberbridges 2008 Proposal
Proposed Work Outline The development and design of a multimedia data management over grid has two critical components: • Proper data management which prompts the requirement of a distributed multidimensional index structure and development of distributed retrieval algorithms (distributed k-NN or Range) supported by the index structure • Efficient retrieval which prompts the introduction of techniques to map low level features with high level semantic concepts, over a distributed environment, to provide relevant query results Global Cyberbridges 2008 Proposal
Proposed Work Outline Concepts to be utilized and Related Works • We have developed an index structure, called Affinity Hybrid Tree [1], for single node or stand alone applications, which is capable of indexing multidimensional images/videos and support CBIR/CBVR • Plan to extend it as the basic indexing and storage framework since it proved itself very efficient in stand alone environments • To capture the high level similarity concepts among the users in a distributed environment, we will develop a novel architecture called Distributed Affinity Capture Model (DACM) based on hierarchical markov model mediator [2]. Global Cyberbridges 2008 Proposal
Proposed Work OutlineComponents • Affinity Hybrid Tree Feature based index mechanism filters the feature space and reduce the # of distance computations to be performed Reduce computational overhead Distance based index mechanism incorporates the high-level image relationship as it is without translating it into its low-level equivalence Increase retrieved image relevance by capturing the user concept as it is Global Cyberbridges 2008 Proposal
Feature Vectors feed root Space Index Indexed subspace Indexed subspace Distance based indexing Distance based indexing Indexed data Indexed data Indexed data Indexed data Proposed Work OutlineComponents • Building AH-Tree Feature space filtering Semantic relationship introduction Global Cyberbridges 2008 Proposal
Computation Cost • Feature-space filtering reduces # of image • objects to be examined. Hence, reduces • # of distance computations manifold. • Accuracy: • AH-Tree – 80% • M-Tree – 10-20% Proposed Work OutlineComponents Sample Results Global Cyberbridges 2008 Proposal
Proposed Work OutlineComponents Hierarchical Markov Model Mediator (HMMM) [2] • A HMMM is represented by an 8-tuple Where, d # levels in HMMM S multimedia objects in different levels F distinctive features or semantic concepts (depending upon the level) A Affinity Relationship between multimedia objects B Features/Concepts at each level Initial state probability distribution O Weights of importance for the lower level features and higher level concepts L Link condition between higher level and lower level states The model has been used successfully for several applications like CBIR and web document clustering Global Cyberbridges 2008 Proposal
Tentative Road Map • Details Literature Review for the following concepts: • available data management tools and techniques in Grid computing • peer-to-peer file sharing systems • Development of the following algorithms and models • devise distributed k-NN search supporting CBIR/CBVR from within an index structure • develop Distributed Affinity Capture Model (DACM) to capture users’ concept of high-level similarity • Implementation of the entire system Global Cyberbridges 2008 Proposal
Conclusion We propose to develop • An efficient multimedia data management framework over a distributed environment like Grid • Develop distributed content-based retrieval algorithms which will span across the grid to provide • semantically close query results • quickly and efficiently • Devise a way to capture users’ concept of similarity across the grid (bridging the gap between low-level features and high-level semantics is a challenge) with • An architecture called Distributed Affinity Capture Model (DACM) Global Cyberbridges 2008 Proposal
Questions Global Cyberbridges 2008 Proposal
Selected References [1] Kasturi Chatterjee and Shu-Ching Chen, "A Novel Indexing and Access Mechanism using Affinity Hybrid Tree for Content-Based Image Retrieval in Multimedia Databases," International Journal of Semantic Computing (IJSC), Vol. 1, Issue 2, pp. 147-170, June 2007. [2] Mei-Ling Shyu, Shu-Ching Chen, Min Chen, Chengcui Zhang, and Chi-Min Shu, "MMM: A Stochastic Mechanism for Image Database Queries," Proceedings of the IEEE Fifth International Symposium on Multimedia Software Engineering (MSE2003), pp. 188-195, December 10-12, 2003, Taichung, Taiwan, ROC. [3] M.-L. Shyu, S.-C. Chen, and C. Haruechaiyasak, C.-M. Shu, and S.-T. Li, “Disjoint Web Document Clustering and Management in Electronic Commerce,” the Seventh International Conference on Distributed Multimedia Systems (DMS’2001), pp. 494-497, 2001. [4] Mei-Ling Shyu, Shu-Ching Chen, Min Chen, Chengcui Zhang, Kanoksri Sarinnapakorn, "Image Database Retrieval Utilizing Affinity Relationships," accepted for publication, the First ACM International Workshop on Multimedia Databases (ACM MMDB'03), November 7, 2003, New Orleans, Louisiana, USA. Global Cyberbridges 2008 Proposal