MPEG-4

MPEG-4 John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault January 20, 2003 Further modified by Ichiro Fujinaga January 20, 2005 CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw

MPEG 4 Standard • Finalized its standardization process in 1999 (Vancouver) • Design to integrate visual and audio • Includes "natural" (recorded) and "synthetic" (synthesized) coding of audio and video

MPEG 4 Scope • Provides a set of technologies to satisfy the needs of • authors • network service providers • end users • Enables the production of content that has far greater reusability in • digital television • animated graphics • web pages

MPEG 4 Features MPEG-4 provide standardized ways to: • represent units of aural, visual or audiovisual content, called “media objects” • Natural origin • Synthetic origin • recorded with a camera or microphone, or generated with a computer • describe the composition of these objects to create compound media objects that form audiovisual scenes • multiplex and synchronize the data associated with media objects, so that they can be transported over networks providing a QoS (Quality of Service) • interact with the audiovisual scene generated at the receiver’s end

MPEG 4 Standard (audio) MPEG 4 video audio system Natural coding Synthetic coding SA TTS AAC T/F CELP Parametric ISO/IEC 14496-3 sec5

MPEG 4 Audio: Natural (recorded) • AAC: The Advanced Audio Coding • Originally created as an extension to MPEG-2 • Provides better quality at 64 kbit/sec/channel than MP3 does at 128 kbit/sec/channel • CELP: A codebook-excited linear prediction • scheme optimized for telephone- quality transmission of speech in the range 8-32 kbps • Parametric: • A novel "harmonic vector + noise" method that allows lossy but extremely low-bitrate coding of wideband sounds down to 2 kbps/sec/ channel

MPEG 4 Audio: Synthetic (synthesized) • Structured Audio: • A downloadable synthesis method that allows producers to describe new synthesis methods as part of the bitstream • the receiver implements a reconfigurable synthesis engine and synthesizes the sound on-the-fly as the instructions are received • Text-to-Speech: • An interface to standalone TTS systems is provided, so that synthetic speech can be synchronized in multimedia presentations • No "method" of creating synthetic speech is standardized by MPEG

MPEG 4 Standard - Structured Audio MPEG 4 video audio system Natural coding Synthetic coding SA TTS AAC T/F CELP Parametric Structured Audio: One “component” in the MPEG audio standard. ISO/IEC 14496-3 sec5

Audio Compression Basics decoder • Traditional Technique for Music amp Filter into Critical Bands Allocate Bits Format Bit-stream time Compute Masking encoder

The Kolmogorov alternative: • Write acomputer program that generates the desired audio stream. • Transmit the computer program. • To decode, execute the program. Similar to Postscript! • MPEG-4 Structured Audio (MP4-SA) uses this approach. • Eric Scheirer, Editor (MIT Media Lab). • http://sound.media.mit.edu/~eds/mpeg4/

MP4-SA Encoding MP4-SA Decoders • are interpreters or compilers. • may be a creative act: writing a program. • directly (emacs), or • indirectly (GUI, webpage) • In this case, MP4-SA is a lossless compressor. • may be automatic: given a sound, an encoder writes a program that generates the sound. • Automatic encoding is a hard in the general case.

Key Application: Music Production Network MP4-SA Maps to Modern Music Production Premium on low-bandwidth • “The Program” • synthesis algorithms • effects “boxes” • mixers • “The Decoder” • sound rendering Musical performance Mix-down control information • Modern music production is computer-based. • Musicians enter performances into computers as control information, not audio waveforms. • Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control.

Key Application: Music Production MP4-SA Maps to Modern Music Production • “The Program” • synthesis algorithms • effects “boxes” • mixers Standard Framework • “The Decoder” • sound rendering Musical performance Mix-down control information File System • Modern music production is computer-based. • Musicians enter performances into computers as control information, not audio waveforms. • Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. Ideal for collaborative productions, remixes, and ...

Key Application: Music Performance MP4-SA Enables Networked Music Performance • “The Decoder” • sound rendering • “The Decoder” • sound rendering Network + + Premium on low-bandwidth • Music Performance requires dynamic control. • True interactively requires parameterized sounds. • Musicians control instruments and effects with interactive controllers. • Control could be indirect and remote (ex: games).

MPEG 4 Structured Audio: • A binary file format that encodes: • The programming language SAOL (pronounced: sail). • The musical score language SASL. • Legacy support for MIDI. • Audio sample data. • Result is normative: an MP4-SA file will sound identical on all compliant decoders. • Different from MIDI files.

Why SAOL and MP4-SA?Why not Java? Amplitude & timbre envelopes: 10’s of msec Sample-by-sample 10’s of usec Note-by-note: 100’s of msec • Musical performance have temporal structure that changes over several timescales: • Writing sound generation code in a conventional language results in code dominated by time-scale management. • Hard to maintain, hard to optimize.

Time management is built into SAOL. • A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion. • Work is scheduled to happen: • at the a-rate (the audio sample rate) • at the k-rate (envelope control rate) • at the i-rate (rate for new notes) • Language variables are typed as a/k/i-rate. • A language statement is scheduled based on the rate of the variables it contains.

SAOL, SASL, and Scheduling: • Sound creation in MP4-SA can be compared to a musician playing notes on an instrument. • A SAOL subprogram (called an instr or instrument) serves as the instrument. • SASL commands (called score lines) act to play notes on SAOL instruments. • Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.

An example: • This SASL file plays melody on tone: 0.5 tone 0.75 52 0.25 1.5 tone 0.75 64 0.25 2.5 tone 0.5 63 0.25 3 tone 0.25 59 0.2 3.25 tone 0.25 61 0.225 3.5 tone 0.5 63 0.225 4 tone 0.5 64 0.25 5 end When instance is launched Instance parameters (note number, loudness) How long instrument runs • SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.)

SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone

SAOL Features • Rate semantics: • i/k/a-rate execution • Vector arithmetic: • ex: A=B+Cfor i=1,n A[i]=B[i]+C[i] • All floating-point arithmetic. • Extensive build-in audio function library: • signal generators, table operators, pitch converters, filters, fft, sample rate conversion, effects, ...

Sfront - a SAOL-to-C translator sfront foo.mp4 sa.c • Handles SAOL, SASL, MIDI, uncompressed samples. SAOL SASL foo.mp4 sfront MIDI sa.c Uncompressed samples • Converts MP4-SA files to a ANSI C program, that when executed, produces audio. • Runs on UNIX, Windows, MacOS. • Under Linux, supports real-time MIDI input, real-time audio input and output, and MIDI over RTP (Real Time Protocol). • www.cs.berkeley.edu/~lazzaro/sa

Generator Techniques • Much of the SA standard describes a library • 104 core opcodes (ex: pow(), allpass(), reverb() ) • 16 wave table generators (ex: harm, spline, random) • Sfront optimizes the code produced for each library element instance based on the invocation attributes • rate, width, size, constancy, integral nature of the parameters, number of paramaters

Conclusions • MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. • Physical Modeling good • Sampling Natural Instruments bad • If models are chosen carefully, compression ratios of 100 to 10,000 are possible. • MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately.

MPEG-4

MPEG-4

Presentation Transcript

MPEG-4

MPEG-4

MPEG-4 Overview

MPEG 4 Structured Audio:

MPEG-4 Multimedia Standard

MPEG-4

MPEG-4

MPEG-4 Applications

MPEG-4 streams

MPEG-4 Structured Audio

MPEG-4

MPEG-4

MPEG-4 Structured Audio

MPEG 4

Mpeg-4 Overview

MPEG-4 Video Compression

MPEG-4