280 likes | 439 Views
AVIR: Audio-Visual Information Retrieval for Non Expert Users. R. Leonardi, Univ. of Brescia Email: leon@ing.unibs.it http://www.extra.research.philips.com/euprojects/avir. AVIR PROJECT. Audio Video Indexing and Retrieval for non-IT-expert users ESPRIT project 28798
E N D
AVIR: Audio-Visual InformationRetrieval for Non Expert Users R. Leonardi, Univ. of Brescia Email: leon@ing.unibs.it http://www.extra.research.philips.com/euprojects/avir
AVIR PROJECT Audio Video Indexing and Retrieval for non-IT-expert users ESPRIT project 28798 Start date: September 98 Duration: 2 years Theme:Information Access & Interfaces Context:Video Metadata production and applications to digital TV programme guides
Philips - NL (Prime contractor) Philips LEP - F RAI Radiotelevisione Italiana - I Tecmath - D TV Spielfilm Verlag - D University of Brescia - I University of Paris, Pierre et Marie Curie - F + BBC Archive - GB (sponsor) AVIR Consortium
AVIR objective Audio Video Indexing and Retrieval for non-IT-expert users Objective: create end-to-end solutions for delivering new added value services on top of video broadcast systems Focus - Personalised TV information access: • Content/Service provider: Indexing system generating metadata • Delivery of service: stream of AV content descriptors • Consumer system: advanced EPG on personalised TV receiver+recorder, with “intelligent” filtering and search.
MUXing Indexing Descriptors Storage Internet AVIR delivery chain Delivery System Service Consumer System Content Provider Systems Service Provider System Video A/V Archive A/V Content +Metadata DVB Metadata DB Receiver
AVIR broadcast services • Two kinds of services: • enriched TV programme description - attractors (RAI). • full-fledged electronic program guide (TVSpielfilm) • No return channel needed. • Usage of intelligent software agent based on user profiles. • Multimodal interaction for information filtering and advanced retrieval. A key issue is the usage of high capacity consumer videorecorders that will result in a paradigm shift from VCR to personal multimedia repository (VOD).
Home Storage and Interoperability Keywords: low costs, short term exploitation • Cost of storage decreases quickly, the cost of bandwidth does not => full interactive services will not arrive soon • High capacity home digital video-recorders will soon become available (DVHS, 50hrs, ‘99 - Video discs, 10-12GB, 2001) • A broadband delivery channel such as DVB is suited to deliver service information commonly used by many users • Low-cost homestorage devices can satisfy the different interests of each user Shift from linear model of broadcast services to interactive system for infotainment, thanks to intermediating role of storage device
Research issues in AVIR • AV content analysis and indexing • Speaker-independent continuous speech recognition with noisy environment • Intelligent software agents for information filtering and searching • User profiling, cooperative annotation and filtering • Multimodal interfaces (representation and interfaces) • AV search and retrieval based on text or visual info • Voice control (speech recognition) • Applications on consumer platforms
Content and Service Provider Systems AVIR will develop new techniques for semi-automatic content extraction from AV material • Unsupervised learning system for video sequence indexing • Structured key-info in database (text, pics, clips) with content description interface to ensure interoperability with consumer systems • Procedures for operators to generate metadata (annotation) for internal management and distribution to public • Descriptors must be streamable, partly linked with the content, partly repeated in a carousel • Multiplexing at system level with content in DVB stream
Consumer System • Descriptors are extracted, analysed and stored in a database (automatic indexing) with references (locators) to AV material and documents • Descriptors help users to easily navigate between different resources (DVB/Internet programs and services, on-air, scheduled, or stored on the system) • Intelligentsoftware agents, based on user interest profiles, can take care of filtering/record AV programmes and information on behalf of the user Metadata will also be used for easy management of AV material and resources in the storage system (e.g. garbage collection)
Metadata in AVIR Interest in international standardization (MPEG7) as to: • AV consumer applications (specific profile?) • push and broadcast applications (streamability, scalability etc.) • consumer browsing and search on local AV databases (user-friendliness of procedures, etc) • Definition of adequate DS’s and D’s for application needs Applications will be tested in experiments with users. Metadata for TV broadcasting MPEG7 I.S. ready in year 2001: short term solutions needed for DVB? • DVB-SI extended with TV-Anytime • New MHP (Digital Home Park) solution using DVB-Data carousels
Visual content extraction methods • Temporal segmentation of video • Shot separation • Correlations between non consecutive camera records (VQ) • Shot description • Editing effects • “Mosaicing”, outlier detection • Camera motion descriptors
Audio Analysis • Speech / Music / Noise / Silence separation • Audio model • Characteristic features • Classification method • Speaker indexing and clustering • Script alignment with speech for 3 movies (~ 270 min.) • Specification of vocal server experimentation for speech transcription (French language)
DDL : Description scheme Definition Language (2) DS : Description Scheme (3) D : Descriptors (10) Non normative tools (extraction methods) (3) P625b, m4591 (UPMC) 655 (PhNL), 502 (UNIBS), 624 (UPMC) 635, 636 (LEP), 384, 488, 490, 491, 492, 493, 494, 497 (UNIBS) 499, 500, 501 (UNIBS) Contributions to MPEG-7
Editing effect extraction method (XM) Cut Wipe Dissolve University of Brescia
Fin image Fout image Statistical independence of shots • Associated histograms = those of two independent R.V. University of Brescia
Central frame of dissolve Statistical independence of shots • Histogram of central frame of a dissolve = convolution of scaled In and Out shot histograms. University of Brescia
Mosaic generation process Warped Image WFn Warping Perspective motion model Object based weighting operator Current Image Fn Weight Map Blending Warping Estimation Error Map Mosaic Mn Mosaic Accretion PreviousMosaic Mn-1
Camera model For any image point, the velocity induced by the camera motion is given by : Ty Y Booming Ry Tracking Panning Tx Rx O X p Tilting x q max f y Rzoom P(X,Y,Z) Rz Zooming Image plane Rolling Z An external coordinate system OXYZ moving with the camera, and the corresponding retinal coordinates (x,y)
t COMPRESSED frames UNCOMPRESSED 8 by 8 Block matching Video Shot Long term motion analysis Instantaneous camera features estimation MPEG-7 Camera motion descriptors Camera motion parameters extraction
Results on “Stefan” sequence Key-frame Mosaic Camera motion parameters
Results on “Coastguard” sequence Key-frame Mosaic Camera motion parameters
Measuring Shot Correlations • For each shot, construct a VQ codebook (“videms”), so as to allow a given reconstruction quality. • Two shots are declared similar when d(S1, S2 ) = ||DC2 (S1)- DC1 (S1)||+ ||DC1 (S2)- DC2 (S2)|| sufficiently small ! • Assign indices accordingly. Dialogue
Query Engine for MPEG-7 description • Characteristics of Query Engine • Parsing DS and Descriptions: checking description validity vs DS • Querying Descriptions • TOCAI based • query-by-example / similarity based retrieval • value based query associated to specific attribute • agent based querying • Architecture issues under investigation • Need for standard parser interface • Need for persistent parsing representation • Need to meet consumer system specification
TOCAI description scheme • Features • multiple levels of abstraction • multiple ordering capability: chronological/“alphabetical” • Analogy: indexing of a book (with enhanced features) • Table of Content (ToC) • What is the book about ? (chapters/sections/subsections/paragraphs) • Analytical index (AI) • Find all pages containing this topic: keyword search.
TOCAI description scheme • Table of Content (ToC) NAVIGATION • Maintain the chronological order • Hierarchical overview (multi-layer semantics) • Analytical index (AI) RETRIEVAL • Create an order of “key elements” according to a certain “ordering key” • “ordering key”: color, size, speed, scene type... • key element • key-image: mosaic, MPEG-4 object. • key-scene: dialogue, action, … University of Brescia
Conclusion • AVIR objective • AVIR delivery chain • Consumer provider system specification • Automatic extraction tools • Adequate DS (TOCAI) for navigation and retrieval • Adequate D’s: camera motion parameters, editing effects, mosaicing, temporal video segmentations (shots/scenes)