430 likes | 622 Views
MPEG-4 Multimedia Standard . Olivier Dechazal. Agenda. Overview Audio coding Video coding System. Definition of MPEG-4 applications. “ A coded, streamable representation of audio-visual objects and their associated time-variant data along with a description of how they are combined ”.
E N D
MPEG-4 Multimedia Standard Olivier Dechazal
Agenda • Overview • Audio coding • Video coding • System
Definition of MPEG-4 applications “A coded, streamable representation of audio-visual objects and their associated time-variant data along with a description of how they are combined”
History • MPEG-4 is an ISO/IEC standard developed by the Moving Picture Experts Group. • MPEG-4 was finalized in October 1998 and became an International Standard in the first months of 1999 • Fully backward compatible extensions under the title of MPEG-4 Version 2 were frozen at the end of 1999, to acquire the formal International Standard Status early in 2000.
Applications for MPEG-4 • Create content once, play on any network • 3 levels of quality • Multimedia database, video games • Video phone, Multimedia authoring • Mobile videophone, wireless LAN
2D inside a 3D plane (0,1,0) Y X Z Examples of possible scenes
Animated Text+ Video + Still Images Examples of possible scenes
Main characteristics • Allows more interaction by the user • Scalable • Object based, allows scenes to be composed of natural and synthetic objects • Very different from MPEG-1&2
MPEG-4 audio coders MPEG 4 video audio system Natural coding Synthetic coding GA SA TTS CELP Parametric
GA • Stands for General Audio • The input signal is first decomposed into a time/frequency spectral representation by means of an analysis filter bank • Then subsequently quantized and coded with AAC and TwinVQ coders
AAC • High quality • Higher bit rate (above 32 kbp/s) • Twice more compressed than MP3 for same quality
TwinVQ Transform-domain Weighted InterleaveVector Quantization (lower bit rate)
MPEG-4 audio coders MPEG 4 video audio system Natural coding Synthetic coding CELP SA TTS GA Parametric
CELP • Code Excited Linear Predictive • Voice coding technique • Used with bitrates between 6-24 kbit/s • 2 sampling rates: 8 and 16 kHz
MPEG-4 audio coders MPEG 4 video audio system Natural coding Synthetic coding Parametric SA TTS GA CELP
Parametric coders • Very low bit rate (2 to 16 kbit/s ) • Decompose the input signal into components which are described by appropriate source models and represented by model parameters • Certain aspects of the coded representation can be manipulated independently • HVXC (Harmonic Vector Excitation Coding) for speech • HILN (Harmonic and Individuals plus Noise) for music
Parametric-HVXC • Harmonic coding of LPC residual signals for voiced segments • Vector eXcitation Coding for unvoiced segments • 2.0 and 4.0 kbit/s of fixed bit rate mode • Less than 2.0 kbit/s of variable rate mode
Parametric-HILN • Models parameters: • Harmonics lines: fundamental freq+ amplitudes of the harmonics components • Individual Lines: frequency and amplitude of each individual line • Noise: spectral shape of the noise (gotten by LPC method) • Bit rate : 6-16 kbit/s
General audio(AAC, TwinVQ) Parametric audio(HILN) Parametric speech(HVXC) High quality speech(CELP) Natural Audio Coders Quality CD FM AM Telephone Cellular 2 4 8 16 32 64 kbit/s
MPEG-4 audio coders MPEG 4 video audio system Natural coding Synthetic coding TTS SA CELP Parametric GA
SA • Structured Audio (use the structural redundancy in the creation of the sounds) • SAOL (Structured Audio Orchestra Language) software-synthesis language for any kind of synthesis (FM, sampling, physical-modeling,…) • SASL (Structured Audio Score Language) coding of the note desired for SAOL, the time of occurrence, and the parameters controlling the differentiating algorithm (how loud the sound is, how long it is, how it varies) • SASBF (Structured Audio Sample Bank Format) format for efficiently transmitting banks of sound samples
MPEG-4 audio coders MPEG 4 video audio system Natural coding Synthetic coding TTS SA GA CELP Parametric
TTS • Text To Speech • TTS coders bit rate range from 200 bit/s to 1.2 Kbit/s • From a text or a text with prosodic parameters (pitch contour, phoneme duration, and so on) it generates intelligible synthetic speech. • Lip synchronization control with phoneme information. • Trick mode functionality: pause, resume, jump forward/backward. • International language and dialect support for text. (i.e., it can be signaled in the bitstream which language and dialect should be used) • International symbol support for phonemes, and support for specifying age, gender, speech rate of the speaker
Scalable audio • SNR / NMR (Noise to Mask Ratio) Scalability • Audio Bandwidth Scalability • Restriction of Generality Ex: CELP + AAC • Implementation Complexity Core layer easier to decode
MPEG-4 Video MPEG 4 video system audio
Video coding-Main features • Coding and animation of synthetic and natural hybrid video object • Same approach as MPEG1/2 algorithms • Macro-block based DCT motion compensation • Uses I,P,B frames and variable length codes • Wide range of bit rate 5 kbit/s to 5 Mbit/s available • Wide range of resolutions available (from a few pel per line to TV resolution) • Supports the coding of arbitrary object shape (non rectangular) • Allowed face and body animation • Coding of 2D and 3D Meshes with Implicit Structure • Supports coding of SPRITE objects
MPEG-4 system MPEG 4 video system audio DMIF Obj Descrip BIFS
BIFS Binary Format for Scene
BIFS Tree scene description
BIFS Features • VRML concepts : set of nodes to represent the primitive scene objects to be composed, the behavior and interactivity • Integration of streams • Integration of 2D and 3D video and audio objects • Advanced Audio Features • Update protocol to modify the scene in time • Compression efficiency
MPEG-4 system MPEG 4 video system audio BIFS DMIF Obj Descrip
Object descriptors • Contain pointers to : • Scalably coded content streams • Alternate quality content streams • Object Content Information (locations, transparency,…) • IPR Information • Sub descriptors for : • Decoder Configuration • Sync. Layer Header Configuration • Quality of Service Information • Extension Information
ES_Descriptor { ES_ID_1 ....... } ES_Descriptor { ES_ID_2 ....... } ES_Descriptor { ES_ID_3 ....... } Object descriptors ObjectDescriptor { OD_ID_1 List of { Elementary- Stream- Descriptors } } Object DescriptorID (OD_ID)
MPEG-4 system MPEG 4 video system audio BIFS Obj Descrip DMIF
DMIF • Delivery Multimedia Integration Framework • It is the interface between the MPEG4 application and the transport network • Irrespective of whether the peer is a remote interactive peer, broadcast or local storage media • Open the different channels for the elementary streams with different bandwidth and QoS • Use of different networks (IP, ATM, narrowband, mobile,…)
Conclusion • MPEG-4 provides a lot of tools to code audio and video objects for a whole range of applications • In addition to this set of tools, MPEG-4 is a structure to manipulate interactively these objects • MPEG-4 has been evolving (more audio and video coders)
Main references • N1683 MPEG4 Overview • N1695 MPEG4 Systems FAQ • http://garuda.imag.fr/MPEG4/syssite/syspub/main.html • http://www.chiariglione.org/mpeg/faq/mp4-aud/mp4-aud.htm • http://www.tnt.uni-hannover.de/project/mpeg/audio/faq/mpeg4.html • http://sound.media.mit.edu/mpeg4/sa-tech.html • http://faac.sourceforge.net/wiki/index.php?page=HVXC • http://wwwam.hhi.de/mpeg-video/standards/mpeg-4.htm#E11E16
Questions????? Let’s go yellow jackets!!!!!!!!!