240 likes | 415 Views
MILOS: An Architecture for Multimedia Digital Libraries and Content Management Applications. Pasquale Savino I.S.T.I. Scope of Digital Library technology. High. Semistructured data. Databases. Knowledge of Users/Tasks. Digital Library Technologies. Information Retrieval. Semantic Web.
E N D
MILOS: An Architecture for Multimedia Digital Libraries and Content Management Applications Pasquale Savino I.S.T.I.
Scope of Digital Library technology High Semistructured data Databases Knowledge of Users/Tasks Digital Library Technologies Information Retrieval Semantic Web Web Low High Structure of Data
Digital Libraries today • Focus on Cultural Heritage preservation and access • Access to OLAP (Online Public Access Catalog) of public libraries, museums, etc. from the Web • New libraries (documents, images, audio/video) with digital multimedia content. • Access based on standardized Metadata, generic (eg. DublinCore) or area-specific • Distributed Web-based architectures • New services available: • Multilingual access • Personalization • Recommendation • Annotation • Collection support
Digital Library Vision • Digital libraries should enable any citizen to access all human knowledgeany time and anywhere, in a friendly,multi-modal, efficient, and effectiveway, by overcoming barriers of distance, language, and culture and by using multiple Internet-connected devices
Digital Library Vision • DL Functionalities • Rich information needs • Multiple sources of related information • Heterogeneous information • Rich data sources • Multimedia information • Defined user populations • Motivated users • Task-orientation • Domain-orientation • Cross-lingual access • Collaboration
Application areas • Multimedia digital archives • Publishing support • Broadcasting support • Production support • E-Learning • Corporate content management • Health and medicine • Biology • Government and Public Administration • …
Why a Content Management System • Digital libraries are used to manage documents of many different types of data • Many different metadata models • DL software components are actually built only for a specific use • Lack of general purpose building components
The main characteristics of the MCMS • Flexibility • Management of different types of data stored in different repositories with different storage strategies • Capability of describing documents with arbitrary, and possibly heterogeneous, metadata • Support of custom/personalized views on the metadata schema used • Scalability • Management of DLs of different sizes • Dealing with DL evolution • Efficiency
The MILOS MCMS • MILOS is a general purpose Multimedia Content Management System • Manages and serves anymultimedia documents • Manages any metadata of documents • MILOS is based on a standard platform • Developed by using the Web Service technology, which provides, in many cases support for authentication, authorization management, distribution, etc. • Mainly developed in Java • Very easy installation (Drag and Drop) • Exploitation of advanced XML native database technology
The MILOS MCMS • Search capabilities: • Traditional fielded search capabilities • Full text search (e.g. on video transcripts) • Search on automatically associated classification categories • Visual content similarity search • The system is not tied to a specific metadata schema • Any XML encoded metadata can be managed by the system (e.g. DC, MPEG-7, ECHO, proprietary model) • Metadata mapping techniques are used to provide users with a homogeneous view • Several different and heterogeneous applications can be supported
Combined Search capabilities: Retrieve all videos with mountains in the background discussing about the Afghanistan heart quake, and classified as foreign affair. XML Search Engine: Structure search Fielded search Full text search Multimedia search Schema independent XQuery support(SOAP Web Service) Search Browser MDEdit. Multimedia Content Management Server:(SOAP Web Service) Web services: Metadata Editor: Visual Basic (SOAP Comm.) Interface Logic Multimedia doc. serv.:(SOAP Web Service) Retrieval Interface: JSP(SOAP Comm.) Repository Metadata Integrator Business Logic Metadata independence: The schema seen in the interface logic can be different of the one(s) used in the repository Full Text Index Multimedia Server Metadata Storage Retrieval Topic Cat. Index Data Logic Visual Cnt. Index ECHO MPEG-7 Dublin Core … MPEG-1 MPEG-2 JPEG … …
Metadata Storage and Retrieval • Based on a native XML database/repository • Solutions based on the use of DB technology, may be too inefficient for complex metadata models • Metadata represented in XML • Arbitrary metadata structure allowed • Export/import of metadata easily managed • No XML schema definition is needed • Arbitrary and heterogeneous metadata representations • Search based on XQuery extended with similarity search support • Optional index definition for performance improvements • The system administrator can associate an index to specific XML elements • Support for free text search • Image similarity search
Multimedia Server • Storage of data of any media • Support of different storage strategies, which may depend on the application (data size, access and transfer time). The required strategy may change over time. • DL application developers must not specify how and where data are stored, but only what is the performance they want • Use of a mapping between URNs and actual location • Use of rules (based on MIME types) to enforce specific storage strategies
Repository Metadata Integrator • Metadata independence (via metadata mapping) • Use of schema mapping rules to map application metadata into Metadata Storage • Each rule specifies how to translate a metadata field known to the application into an XPath expression used to access that field in the Metadata Storage • Mapping rules are used to specify the XQuery statements executed in the Metadata Storage and to transform them back into application metadata
Access to heterogeneous metadata repositories MILOS repository based on ECHO metadata MILOS repository based on MPEG-7 metadata Application providing a Dublin Core view on metadata
Ingestion of existing data and metadata in MILOS Repository using a proprietary metadata model Ingestion of data and metadata in MILOS New metadata immediately accessible. Possibility to define indexes to speed-up retrieval MILOS repository based on the proprietary metadata
Distribution and multiple disk storage Multimedia Server MILOS repository using multiple disk storage Multimedia Server
Examples of DL archives Four DL have been ingested • Reuters Data Set • 810000 news agencies (2,6 GB), text and metadata encoded in XML • ACM Sigmod Record and DBLP data sets • Sigmod Record composed of 46 XML files • DBLP – one single XML file (187MB) • Different structure, one single interface through mapping mechanisms • The ECHO data set • About 50 hours of historical documentaries (8000 videos), coming from 4 different countries • 43000 XML files (36MB), 21GB MPEG-1 video and Jpeg • Image similarity search based on MPEG-7 image descriptors
Main components: Entry point Indexing Workflows: Metadata editing station Automaticprocessing services: Speech recognition, Segmentation, Summarisation, … New Film Filmrepository Entry point AutomaticProcessing Video and Metadatarepository Metadatarepository Manualmetadataediting Indexing videos
Searching videos Main components: Examples of queries • Metadata associated to the entire video • E.g. find b&w videos produced before II world war by Istituto Luce • Metadata associated to video shots • E.g. find a shot where the audio transcript contains the words “Attentato Banca Nazionale dell’Agricoltura” • Metadata associated to single frames • E.g. find a video that contains a frame similar to this image [the image is provided as an example] • Any combination of the previous cases • Video Search • Access to metadata DB • Full text search on transcripts • Image similarity search • Cross-language retrieval on selected metadata fields and transcript • Query formulation • Metadata fields • Audio transcripts • Video key frames • Cross-language queries Transcript Repository Video key frames Repository Video and Metadatarepository
ECHO metadata model • Supports a multi-layer and hierarchical description of audio-video documents • Description of different aspects of the same document • The model can be adapted to specific application needs • Describes metadata that are automatically extracted as well as metadata manually extracted
ECHO metadata model Extends the IFLA-FRBR model Four entities used to describe different aspect of a resource: • WORK • EXPRESSION • MANIFESTATION • ITEM Describes a distinct intellectual or artistic creation It is the abstract idea of a creation We do not specify if we realize a book, a film, or a cartoon This is described by the Expression Entity Examples of WORK are The terrorist attack at Banca Nazionale dell’Agricoltura 2001: A space Odyssey, ……. Describes a distinct intellectual or artistic creation Intellectual or artistic realisation of a work in the form of alphanumeric, musical, or choreographic notation, sound, image, etc.. No information on the physical embodiment is given Examples of Expression are: TV news on the terrorist attack A documentary on the terrorist attack Interviews on the terrorist attack ……… Intellectual or artistic realization of a work Physical embodiment of an expression Physical embodiment of an expression E.g. manuscripts, books, maps, sound, CD_ROM A single exemplar of a manifestation A single exemplar of a manifestation
MILOS Demo • Start