180 likes | 205 Views
Explore the scope, components, and objectives of MPEG-7 audio metadata standard. Learn about low-level and high-level features useful for multimedia applications. Discover the applications and extraction methods of MPEG-7 descriptors.
E N D
MPEG-7 Audio Overview Ichiro Fujinaga MUMT 611 McGill University
Content • MPEG-7 overview • Objectives and scope • Main elements and organization • MPEG-7 audio • Low-level features • High-level features and tools MUMT611 Fujinaga
Introduction • (formally) Multimedia Content Description Interface • MPEG-1, 2, 4: Content coding and representation • MPEG-7: Metadata (1998-2001) • standardized descriptions and description schemes of structures and content of multimedia • a language to specify such descriptions and description schemes • Interoperable interface that defines syntax and semantics • Modalities: audio, visual, or multimedia • Aspects: media, meta, structural, or semantic • Applications: searching, filtering, navigation MUMT611 Fujinaga
Scope • The goal is to provide interoperability among multimedia applications in • Generation • Management • Distribution • Consumption MUMT611 Fujinaga
Application domains • Broadcast media selection (radio channel, TV channel) • Digital libraries (film, video, audio and radio archives) • E-Commerce (personalized advertising) • Education (repositories of multimedia courses, multimedia search for support material) • Home Entertainment (management of personal multimedia collections, including manipulation of content, e.g. karaoke). Journalism (searching speeches of a certain politician using his name, his voice or his face) • Multimedia directory services (yellow pages) • Surveillance and remote sensing MUMT611 Fujinaga
Components (XML) • MPEG-7 Systems • MPEG-7 Description Definition Language • MPEG-7 Visual • MPEG-7 Audio • MPEG-7 Multimedia Description Schemes • Reference Software: the eXperimentation Model (test) • MPEG-7 Conformance (syntax checking) • MPEG-7 Extraction and use of descriptions (technical report) MUMT611 Fujinaga
Other Standards • SMPTE • EBU • TV-Anytime • DIG-35 • Dublin Core • OCLC/RLG MUMT611 Fujinaga
MPEG-7 Objectives • Information about the content • Form: e.g. the coding format used • Conditions for accessing the material: • Intellectual property rights / price • Classification: e.g. parental rating • Links to other relevant materials • Context: e.g. “Olympic Games 1996, final of 200 meter hurdles, men” • Information present in the content: • Combination of low-level and high-level descriptors MUMT611 Fujinaga
Where do the descriptions come from? • Preservation of existing descriptive data through the production/delivery • Generated automatically by capture devices(e.g. time or GPS location in a camera) • Extracted automatically & semi-automatically • Manually produced (e.g. for legacy material such as existing film archives) MUMT611 Fujinaga
Main Elements of MPEG-7 • Description Tools: ( textual / binary ) • Descriptors (D): define the syntax and the semantics of each feature (metadata element) • Description Schemes (DS): relationships between components • Description Definition Language (DDL): • Define the syntax of the MPEG-7 Description Tools • Creation , extension ,and modification of DSs • System tools: • Storage and transmission, synchronization of descriptions with content, multiplexing of descriptions, etc. MUMT611 Fujinaga
Main Elements of MPEG-7 Salembier andAvaro (2001) MUMT611 Fujinaga
Description Tools • Creation and production processes: (director, title) • Usage: (broadcast schedule) • Storage features • Structural information: (spatial-temporal components) • Segmentations • Low-level features: (sound timbres, melody description) • Conceptual information: (objects and events, interactions) • Navigation and access: (summaries, variations) • Collections of objects • User-content interactions: (user preferences, usage history) MUMT611 Fujinaga
MPEG-7 Audio • Audio provides structures—building upon some basic structures from the MDS (Multimedia Description Schemes)—for describing audio content. • Low-level features • audio features that cut across many applications • High-level features and tools • more specific to a set of applications MUMT611 Fujinaga
Low-level Features • Two low-level descriptor types (for sample and segment) • Scalar : (e.g. power or fundamental frequency) • Vector : (e.g. spectra) • Hierarchical, consistent interface • Any descriptor inheriting from these types can be instantiated, describing a segment with a single summary value or a series of sampled values, as the application requires. • Scalable series (hierarchical re-sampling) • Progressively down-sample the data contained in a series (application-oriented) MUMT611 Fujinaga
Low-level Features Salembier andAvaro (2001) MUMT611 Fujinaga
High-level Features • Exchange some generality for descriptive richness: • a smaller set of audio features (as compared to visual features) that may canonically represent a sound without domain-specific knowledge. • Audio Signature (DS) • Musical Instrument Timbre • Melody • General Sound Recognition and Indexing • Spoken Content MUMT611 Fujinaga
Recent Development • New audio description tools specified (MPEG-7 version 2): • Audio signal quality • Audio tempo • Chord pattern • Rhythm pattern • Multi-channel MUMT611 Fujinaga
References • Chang, S., T. Sikora, and A. Puri, 2001. Overview of MPEG-7 Standard. IEEE Transactions on Circuits and Systems for Video Technology 11 (6): 688-95. • Matinez, J. 2004. MPEG-7 Overview.http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm • Quackenbush, S. and A. Lindsay. 2001. Overview of MPEG-7 audio. IEEE Transactions on Circuits and Systems for Video Technology 11 (6): 725-9. • Salembier, P., andO. Avaro. 2000. MPEG-7: Multimedia Content Description interface.http://gps-tsc.upc.es/imatge/_Philippe/demo/MPEG21_MPEG7.pdf MUMT611 Fujinaga