MPEG-7 Audio Overview

MPEG-7 Audio Overview Beinan Li MUMT 611 Week 2 2005. 1. 20

Content • MPEG-7 overview • What is… • Why? • Objectives and scope • Main elements and organization. • MPEG-7 Audio • Low-level features • High-level tools

What is MPEG-7 • "Multimedia Content Description Interface“ • ISO/IEC standard by MPEG (Moving Picture Experts Group) • Providing meta-data for multimedia • MPEG-1, -2, -4: make content available;MPEG-7: makes content accessible, retrievable, filterable, manageable (via device / computer). • Multi-degrees of interpretation of information’s meaning • Support as broad a range of applications as possible. • A compatible (with existing tech) and extensible standard.

Why MPEG-7 • “The value of information often depends on how easy it can be found, retrieved, accessed, filtered and managed. ” • Past: poverty of the digital multimedia sources -> Simplicity of the access mechanisms • Now: growing amount of audiovisual information-> Identifying and managing them efficiently is becoming more difficult. e.g. “record only news about sport.”

Why MPEG-7 • For future multimedia services, content representation and description may have to be addressed jointly. • Many services dealing with content representation will have to deal first with content description • “a non-described content may be useless” • Need for access only to the content description: • New original services (e.g. optimizing personal time) • Adaptation to networks and terminal capabilities

Application’s domains (incomplete) • Broadcast media selection (e.g., radio channel, TV channel). • Digital libraries (e.g., film, video, audio and radio archives). • E-Commerce (e.g., personalized advertising). • Education (e.g., repositories of multimedia courses, multimedia search for support material). • Home Entertainment (e.g., management of personal multimedia collections, including manipulation of content, e.g. karaoke). • Journalism (e.g. searching speeches of a certain politician using his name, his voice or his face). • Multimedia directory services (e.g. yellow pages, G.I.S). • Surveillance and remote sensing.

MPEG-7 Objectives Standardize content-based description for various types of audiovisual information • Independent from media support (encoding and storage) • Different granularity • Low-level features: shape, size, key, tempo changes, • High-level semantic info: “scene with a barking brown dog on the left and with the sound of passing cars in the background.” • Meaningful in the context of the application • Same material -> different types of features and combinations e.g. timbre v.s. loudness

MPEG-7 Objectives • Information about the content • The form: e.g. the coding format used • Conditions for accessing the material:e.g. Intellectual property rights / price • Classification: e.g. parental rating • Links to other relevant materials • The context: “e.g. Olympic Games 1996, final of 200 meter hurdles, men)” • Information present in the content: • Combination of low-level and high-level descriptors

Scope of the Standard processing chain:

An example of architecture • Pull: (Client Queries -> Descriptions repository -> Matched Ds) • Push: (Filter descriptions -> Programmed actions)

Workplan

Where are the descriptions from? • Preservation of existing descriptive data (e.g. scripts) through the production/delivery • Generated automatically by capture devices(e.g. time or GPS location in a camera) • Extracted automatically & semi-automatically (i.e. with some human assistance) • Manually produced (e.g. for legacy material such as existing film archives)

Main Elements of MPEG-7 • Description Tools: ( textual / binary ) • Descriptors (D): define the syntax and the semantics of each feature (metadata element) • Description Schemes (DS): relationships between components • Description Definition Language (DDL): • Define the syntax of the MPEG-7 Description Tools • Creation , extension and modification of DSs • System tools: • Storage and transmission, synchronization of descriptions with content, multiplexing of descriptions, etc.

Main Elements of MPEG-7 • Relationship among elements introduced above.

Description Tools • Creation and production processes: (director, title) • Usage: (broadcast schedule) • Storage features. • Structural information: (spatial-temporal components) • Segmentations • Low level features: (sound timbres, melody description) • Conceptual information: (objects and events, interactions) • Navigation and access: (summaries, variations) • Collections of objects. • User-content interactions: (user preferences, usage history)

Organization of Description Tools

Descriptions (further) • MPEG-7 approaches the description of content from several viewpoints. • A set of methods and tools for the different viewpoints of the description (not a monolithic system) • Interrelated and can be combined in many ways. • Associated with the content itself: (searching, filtering) • Location: (document V.S. stream) • physically located with the material • somewhere else on the globe (maybe not) • Interoperability with other metadata standards: (XML)

Use of Description Tools • The description tools are presented on the basis of the functionality they provide. • In practice, they are combined into meaningful sets of description units. • Furthermore, each application will have to select a sub-set of descriptors and DSs. • Library of tools! • DDL can be used to handle specific needs of the application. (like scripting in many current applications)

Major Functionalities • MPEG-7 Systems • MPEG-7 Description Definition Language • MPEG-7 Visual • MPEG-7 Audio • MPEG-7 Multimedia Description Schemes (D.T.) • Reference Software: the eXperimentation Model (test) • MPEG-7 Conformance (syntax checking) • MPEG-7 Extraction and use of descriptions (technical report)

MPEG-7 Audio • Audio provides structures—building upon some basic structures from the MDS—for describing audio content. • Low-level Descriptors: • audio features that cut across many applications • High-level Description Tools: • more specific to a set of applications.

Low-level Features • “MPEG-7 Audio Framework”: • Two low-level descriptor types: (for sample and segment) • Scalar : (e.g. power or fundamental frequency) • Vector : (e.g. spectra) • Hierarchical, consistent interface • Any descriptor inheriting from these types can be instantiated, describing a segment with a single summary value or a series of sampled values, as the application requires. • Scalable Series: (hierarchical re-sampling) • Progressively down-sample the data contained in a series (Application-oriented)

Low-level Features (types) • Basic • Basic Spectral • Signal Parameters • Timbral Temporal • Timbral Spectral • Spectral Basis • MPEG-7 Silence Descriptor

Low-level Features (graph)

Low-level Features (details) • Basic: (temporally sampled scalar values for general use) • AudioWaveform Descriptor • waveform envelope: (for display purposes). • AudioPower Descriptor • temporally-smoothed instantaneous power: (quick summary of a signal) • Applicable to all kinds of signals

Low-level Features (details) • Basic Spectral:(single time-frequency analysis of signal) • AudioSpectrumEnvelope: (Base class) • the short-term power spectrum: (display, synthesize, general-purpose search) • AudioSpectrumCentroid: • dominated by high or low frequencies ? • AudioSpectrumSpread: • the power spectrum centered near the spectral centroid, or spread out over the spectrum? • pure-tone and noise-like sounds • AudioSpectrumFlatness: (the presence of tonal components)

Low-level Features (details) • Signal Parameters: (periodic or quasi-periodic signals) • AudioFundamentalFrequency: • “confidence measure”, replacing “pitch-tracking” • AudioHarmonicity: • distinction between sounds with a harmonic / inharmonic / non-harmonic spectrum

Low-level Features (details) • Timbral Temporal:(temporal characteristics of segments of sounds, musical timbre) • LogAttackTime • TemporalCentroid • where in time the energy of a signal is focused. • Useful when attack times are identical

Low-level Features (details) • Timbral Spectral: (spectral features in a linear-frequency space) • SpectralCentroid: • power-weighted average of the frequency of the bins in the linear power spectrum. • distinguishing musical instrument timbres • 4 Ds for harmonic regularly-spaced components of signals: • HarmonicSpectralCentroid • HarmonicSpectralDeviation • HarmonicSpectralSpread • HarmonicSpectralVariation

Low-level Features (details) • Spectral Basis: (low-dimensional projections of a spectral space to aid compactness and recognition) • AudioSpectrumBasis: • a series of (time-varying / statistically independent) basis functions derived from the singular value decomposition of a normalized power spectrum. • AudioSpectrumProjection: • low-d features of a spectrum after projection upon a reduced rank basis. • independent subspaces of a spectra correlate strongly with different sound sources. • Provide more salience using less space. • With Sound Classification and Indexing Description Tools.

Low-level Features (details) • Silence segment: (no significant sound) • aid further segmentation of the audio stream, or as a hint not to process a segment

High-level audio Description Tools (Ds and DSs) • Exchange some generality for descriptive richness: • a smaller set of audio features (as compared to visual features) that may canonically represent a sound without domain-specific knowledge. • Audio Signature (DS) • Musical Instrument Timbre • Melody • General Sound Recognition and Indexing • Spoken Content

High-level audio Description Tools (details) • Audio Signature Description Scheme • SpectralFlatness Ds • a unique content identifier for the purpose of robust automatic identification • e.g. audio fingerprinting

High-level audio Description Tools (details) • Musical Instrument Timbre Description Tools • HarmonicInstrumentTimbre Ds: • LogAttackTime Descriptor • PercussiveIinstrumentTimbre Ds: • SpectralCentroid Descriptor

High-level audio Description Tools (details) • Melody Description Tools: • efficient, robust, and expressive melodic similarity matching. • MelodyContour Description Scheme: • terse, efficient melody contour / rhythm • MelodySequence Description Scheme: • verbose, complete, expressive melody / rhythm. • Interval encoding

High-level audio Description Tools (details) • General Sound Recognition and Indexing Description Tools: • SoundModel Description Scheme • SoundClassificationModel Description Scheme • a set of SoundModel DS -> multi-way classifier • SoundModelStatePath Descriptor • indices to states generated by a SoundModel of a segment • immediately applied to sound effects • automatically index and segment sound tracks. • Low -> mid -> high level analyses

High-level audio Description Tools (details) • Spoken Content Description Tools: • detailed description of words spoken within an audio stream. • indexing into and retrieval of an audio stream • indexing of multimedia objects annotated with speech. • Recall of audio/video data by memorable spoken events. • a character or person spoke a particular word • Spoken Document Retrieval • separate spoken documents • Annotated Media Retrieval • photograph retrieved using a spoken annotation

Development • Currently under development: • MPEG-7 Audio COR.1 (currently at DCOR1) • MPEG-7 Amendment 1 (currently at FPDAM1) • New Audio Description Tools specified (MPEG-7 version 2): • Spoken Content: • Audio Signal Quality: • Audio Tempo: • Currently Proposed tools: • Low Level Descriptor for Audio Intensity • Low Level Descriptor for Audio Spectrum Envelope Evolution • Generic mechanism for data representation based on ‘modulation decomposition’ • MPEG-7 Audio-specific binary representation of descriptors

MPEG-7 version 1 Schedule • Call for Proposals October 1998 • Evaluation February 1999 • First version of Working Draft (WD) December 1999 • Committee Draft (CD) October 2000 • Final Committee Draft (FCD) February 2001 • Final Draft International Standard (FDIS) July 2001 • International Standard (IS) September 2001

MPEG-7 work plan: • See : Annex A of MPEG-7 Overview (version 9) http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm

Annotated Link Page / References • http://www.music.mcgill.ca/~damonli/611/611_w2.htm • All pictures taken from: • P. Salembier andO. Avaro,“MPEG-7: Multimedia Content Description interface”, http://gps-tsc.upc.es/imatge/_Philippe/demo/MPEG21_MPEG7.pdf

MPEG-7 Audio Overview