Concepts of Multimedia Processing and Transmission

Concepts of Multimedia Processing and Transmission IT 481, Lecture #11 Dennis McCaughey, Ph.D. 20 November, 2006

Broadcast Environment IT 481, Fall 2006

The MPEG-4 Layered Model Media Aware Delivery Unaware MPEG-4 Visual MPEG-4 Audio Compression Layer Elementary Stream Interface (ESI) Media Unaware Delivery Unaware MPEG-4 Systems Sync Layer DMIF Application Interface (DAI) Delivery Layer Media Unaware Delivery Aware MPEG-4 DMIF IT 481, Fall 2006

MPEG-4: Delivery Integration of Three Major Technologies The Broadcast Technology Cable, Satellite, etc… The Interactive Network Technology The Disk Technology Internet, ATM, etc… CD, DVD, etc… IT 481, Fall 2006

MPEG-4: DMIF Communication Architecture IT 481, Fall 2006

MPEG-4: DMIF Communication Architecture • DMIF (Delivery Multimedia Integration Framework) • It is a session protocol for the management of multimedia streaming over generic delivery technologies. • In principle it is similar to FTP. The only (essential!) difference is that FTP returns data, DMIF returns pointers to where to get (streamed) data • When FTP is run, • Very first action it performs is the setup of a session with the remote side. • Later, files are selected and FTP sends a request to download them, the FTP peer will return the files in a separate connection. • When DMIF is run, • Very first action it performs is the setup of a session with the remote side. • Later, streams are selected and DMIF sends a request to stream them, the DMIF peer will return the pointers to the connections where the streams will be streamed, and then also establishes the connection themselves. IT 481, Fall 2006

DMIF Computational Model IT 481, Fall 2006

DMIF Service Activation • The Originating Application request the activation of a service to its local DMIF Layer – • a communication path between the Originating Application and its local DMIF peer is established in the control plane (1) • The Originating DMIF peer establishes a network session with the Target DMIF peer – • a communication path between the Originating DMIF peer and the Target DMIF Peer is established in the control plane (2) • The Target DMIF peer identifies the Target Application and forwards the service activation request – • a communication path between the Target DMIF peer and the Target Application is established in the control plane (3) • The peer Applications create channels (requests flowing through communication paths 1, 2 and 3). • The resulting channels in the user plane (4) will carry the actual data exchanged by the Applications. • DMIF is involved in all four steps above. IT 481, Fall 2006

DAI • Compared to FTP, DMIF is both a framework and a protocol. • The functionality provided by DMIF is expressed by an interface called DMIF-Application Interface (DAI), and translated into protocol messages. These protocol messages may differ based on the network on which they operate. • The DAI is also used for accessing broadcast material and local files, this means that a single, uniform interface is defined to access multimedia contents on a multitude of delivery technologies. IT 481, Fall 2006

DNI • The DMIF Network Interface (DNI)- is introduced to emphasize what kind of information DMIF peers need to exchange; • It is an additional module ("Signaling mapping" in the figure) takes care of mapping the DNI primitives into signaling messages used on the specific Network. • Note that DNI primitives are only specified for information purposes, and a DNI interface need not be present in an actual implementation,. IT 481, Fall 2006

MPEG-4 Video Bitstream Logical Structure Layer 2 Layer 1 IT 481, Fall 2006

Motion Compensation • Three steps • Motion Estimation • Motion-compensation-based-prediction • Coding of the prediction error • MPEG-4 defines a bounding box for each VOP • Macroblocks entirely within the VOP are referred to as interior macroblocks • Macroblocks straddling the VOP boundary are called boundary macroblocks • Motion compensation for interior macroblocks is the same as MPEG-1&2 • Motion compensation for boundary macroblocks requires padding • Help match every pixel in the target VOP • Enforce rectangularity for block DCT encodeing IT 481, Fall 2006

MPEG-4: Motion Estimation • Block-based techniques in MPEG-1 and MPEG-2 have been adopted to MPEG-4 VOP structure • I-VOP: Intra VOP • P-VOP: Predicted VOP based on previous VOP • B-VOP: Bidirectional Interpolated VOP predicted based on past and future VOP • Motion estimation (ME) only necessary for P-VOPs and B-VOPs • Differentially coded from up to three Motion Vectors • Variable length coding used for encoding MVs IT 481, Fall 2006

MPEG-4 Texture Coding • VOP texture information is in luminance and chrominance for I-VOP • For P-VOP and B-VOP, texture information represents residual information remaining after motion compensation • Standard 8x8 block-based DCT used • Coefficients quantized, predicted, scan and variable length encoded • DC and AC coefficient prediction based on neighboring blocks to reduce energy of quantized coefficients IT 481, Fall 2006

Bounding Box & Boundary Macroblocks IT 481, Fall 2006

Padding • For all boundary macroblocks in the reference VOP • Horizontal Repetitive Padding • Vertical Repetitive Padding • For all exterior macroblocks outside the VOP, but adjacent to one or more boundary macroblocks • Extended padding IT 481, Fall 2006

Horizontal Repetitive Padding Algorithm begin for all rows in Boundary macroblocks in the reference VOP if there exists a boundary pixel in the row for all interval outside the VOP if interval is bounded by only one boundary pixel b assign the value b to all pixels in interval elseif interval is bounded by two boundary pixels b1 and b2 assign the value (b1+ b2)/2 to all pixels in interval end IT 481, Fall 2006

Vertical Repetitive Padding Algorithm • Horizontal algorithm applied to the columns IT 481, Fall 2006

Original Pixels Within the VOP IT 481, Fall 2006

Horizontal Repetitive Padding IT 481, Fall 2006

Vertical Repetitive Padding IT 481, Fall 2006

Shape Adaptive Texture Coding for Boundary macroblocks IT 481, Fall 2006

Considerations • Total number of DCT coefficients equals the number of grayed pixels which is less than 8x8 • Fewer computations than an 8x8 DCT • During decoding translations must be reversed so a binary mask of the original shape must be provided IT 481, Fall 2006

MPEG-4 Shape Coding • Binary shape coding • Grayscale shape coding IT 481, Fall 2006

Static Texture Coding IT 481, Fall 2006

Sprite Coding IT 481, Fall 2006

Global Motion Compensation IT 481, Fall 2006

MPEG-4 Scalability • Spatial and temporal scalability implemented using VOLs (video object layers) • Base and enhancement layers IT 481, Fall 2006

Scalability • There are several scalable coding schemes in MPEG-4 Visual: • Spatial Scalability • Spatial scalability supports changing the texture quality (SNR and spatial resolution). • Temporal Scalability • Object-Based Spatial Scalability. • Extends the 'conventional' types of scalability towards arbitrary shape objects, so that it can be used in conjunction with other object-based capabilities. • This makes it possible to enhance SNR, spatial resolution, shape accuracy, etc, only for objects of interest or for a particular region, which can even be done dynamically at play-time. IT 481, Fall 2006

Base and Enhancement Layer Behavior (Spatial Scalability) IT 481, Fall 2006

Two Enhancement Types in MPEG-4 Temporal Scalability In enhancement type I, only a selected region of the VOP (i.e. just the car) is enhanced, while the rest (i.e. the landscape) is not. In enhancement type II, enhancement is applicable only at entire VOP level. 1. Type I: The enhancement-layer improves the resolution of only a portion of the base-layer 2. Type II: The enhancement-layer improves the resolution of the entire base-layer. IT 481, Fall 2006

Subset of MPEG-4 Video Profiles and Levels IT 481, Fall 2006

MPEG-4 Natural & Synthetic Video Coding • Synthetic 2D and 3D objects represented by meshes and surface patches • Synthetic VOs are animated by transforms and special-purpose animation techniques • Representation of synthetic VOs based on Virtual Reality Modeling Language (VRML) standard • For natural objects, a large portion of materials used in movie and TV production is blue-screened, making it easier to capture objects against a blue background IT 481, Fall 2006

Integration of Face Animation with Natural Video • Three types of facial data: Facial Animation Parameters (FAP), Face Definition Parameters (FDP) and FAP Interpolation Table (FIT) • FAP allows the animation of a 3D facial model available at the receiver • FDP allows one to configure the 3D facial model to be used at the receiver • FIT allows one to define the interpolation rules for the FAP at the decoder IT 481, Fall 2006

Integration of Face Animation and Text-to-Speech (TTS) Synthesis Propriety Speech Synthesizer Audio Decoder Compositor TTS Stream Phoneme/ Bookmark to FAP Converter Video Face Renderer • Synchronization of a FAP stream with TTS synthesizer possible only if encoder sends timing information IT 481, Fall 2006

DVB-H

DVB-H in a DVB-T Network NOKIA IT 481, Fall 2006

DVB-H Receiver IT 481, Fall 2006

DVB-H System (Sharing a Mux with MPEG-2 Services) IT 481, Fall 2006

Detail MPE-FEC IT 481, Fall 2006

DVB-T/H Transmitter NOKIA IT 481, Fall 2006

DVB-H Standards Family NOKIA IT 481, Fall 2006

References • “MPEG-4 Natural Video Coding - An overview” Touradj Ebrahimi* and Caspar Horne** • J. Henriksson, “DVB-H, Standards Principles and Services”, Nokia HUT Seminar T-111.590 Helsinki Finland 2.24.2005 IT 481, Fall 2006

Concepts of Multimedia Processing and Transmission