Multiplexing H.264 and HEAACv2 elementary streams, de-multiplexing and achieving lip synchronization during playback

Multiplexing H.264 and HEAACv2 elementary streams, de-multiplexing and achieving lip synchronization during playback NaveenSiddaraju naveen.siddaraju@mavs.uta.edu

Contents: • Introduction : Need for multiplexing • Overview of codecs used • Transport protocols • Multiplexing • De-multiplexing and synchronization • Results • Conclusions • Future work • References

Introduction: need for multiplexing • Digital television broadcasting • ATSC- M/H [17] • DVB- H • DVB- T • Internet streaming • IPTV, YouTube etc .

MPEG transport system [17]

Choice of CODECs • Depends on the application. • Transport bandwidth - ATSC-M/H channel bandwidth 19.6Mbps - DVB-H channel bandwidth 14 Mbps • Processing power of the target device

H.264/ AVC • Defined in MPEG4 part 10 • Jointly developed by ITU – T VCEG and MPEG group of ISO/IEC. • Provides better compression than its predecessors like MPEG 2 video and MPEG 4 part 2. • Suitable for a wide variety of applications. • Adopted standard in ATSC-M/H, DVB etc • Used in Blu-ray discs, DVDs, iTunes, flash player, video conferencing applications etc

Different profiles of H.264[5]

Frame types • Three basic types • Intra predictive (I) frame • Predictive (P) frame • Bi predictive (B) frame • IDR frame is a special type of I frame. - indicates the start of a video sequence.

Bitstream syntax of H.264 • Data is organized into two layers • VCL (video coding layer) • NAL (network abstraction layer) • NAL formatting of VCL and non-VCL data [6]

Forbidden bit • NRI - 2bits • Type - 5 bits NAL unit format[6]

NAL unit types [1]

Important NAL unit types • IDR frames - indicates start a of new video sequence • Sequence parameter sets (SPS) - contains parameters common to entire sequence - profile, level, size of the video, no of reference frames • Picture parameter sets (PPS) - contains parameters that to a frame or some frames in a sequence - entropy coding , quantization parameters etc .

H.264 stream [37]

HEAACv2 • Also called enhanced aac plus • Developed by coding technologies for very low bitrate applications . • Defined in MPEG4 part 3 amendment 2 • Enables coding in mono, stereo and multi channels (up to 48 channels ) • Is a combination of AAC, SBR, PS • Provides highest perceptible quality for the lowest bitrate • Adopted as audio standard in ATSC- M/H, DVB, XM satellite radio • Can exist in a variety of file formats like mp4, m4a. • Controlled testing conducted by 3gpp [27] indicates that HEAACv2 provides good quality audio at 24kbps.

HEAACv2 family of codecs [7]

AAC (advanced audio codec) • Successor of the MP3 format • Defined both in MPEG2 [3] and MPEG4 [2] • Achieves better sound quality than MP3 for same bitrates. • AAC is also the standard audio format for apple iPhone, iPod, iPad, Sony playstation etc . • Up to 48 channels (MP3 supports up to two channels in MPEG-1 mode and up to 5.1channels in MPEG-2 mode) • More sampling frequencies (from 8 to 96 kHz) than MP3 (16 to 48 kHz) • Achieves good quality audio at 128 kbps for stereo.

SBR (spectral band replication) [2] • SBR is a bandwidth expansion technique • Exploits the correlation between the high and low frequencies. • Using SBR, along with AAC, high quality stereo sound can be achieved at 48 kbps.

High band reconstruction through SBR [28] • Original audio signal [28]. • High band reconstruction through SBR [28].

PS (parametric stereo) [2] • Only used for low bitrate applications ( < 32kbps) • Parameterizes the stereo image such as time/phase differences, interchannel intensity differences etc. • Only monaural version of the stereo is encoded by the AAC encoder. • At the decoder side the monaural signal is decoded first, and then stereo signal is reconstructed using the PS parameters • Using PS along with AAC and SBR , reasonable quality stereo sound can be achieved at 24 kbps.

HEAACv2 bitstream formats • ADIF (audio data interchange format) - has just one header for the whole stream - used in storage media. • ADTS (audio data transport stream) - used in transport stream. - has headers in every access unit.

ADTS header format[2][3]

Profile bits expansion [2] [3]

ADTS bit stream [3]

Transport protocols • Most multimedia applications involve communication channels or storage. • RTP (real time protocol) - transport over IP networks • MPEG2 systems - digital television broadcast - storage (asset management)

MPEG2 systems • Defines two types streams - Program stream (PS) - used for storage , ex. DVD - Transport stream (TS) - used for digital broadcast • Two layers of packetization - PES (packetized elementary streams) - TS (transport stream)

MPEG2 transport stream [22]

PES (packetized elementary stream) • First layer of packetization • Separates audio video elementary streams into access units. • Variable length • Contains a header and payload (frame) data. • Add fields like time stamp, stream ID, packet length

Conversion of an elementary stream into PES packets [29]

PES packet header format used [4]

Frame number as time stamp • For video, fps is a constant through out the sequence. • For audio, sampling frequency is a constant through out the sequence.

TS packets • Second layer of packetization • Fixed length (188 bytes) • PES is logically broken down in to 188 byte packets • Three byte header contains packet ID, payload unit start flag, continuity counter etc.

Transport stream (TS) packet format

TS header description: • payload unit start indicator (PUSI) flag - indicates payload has PES header. • Adaptation field control (AFC) flag - indicates payload is less than 185 bytes • Continuity counter (CC) (4 bits) - 4 bit counter, used to check for any packet losses, out of sequences etc . • Packet ID (PID) (10 bits) - uniquely identifies the particular ES , the packet belongs to • Optional offset byte : - contains the offset value is AFC is set.

Multiplexing • What is multiplexing ? • Multiplexing is a process of transmitting TS packets belonging to different elementary streams . • Muxing is a processes of how effectively the TS packets are interleaved in the TS stream , so that both audio and video contents get transmitted simultaneously. • Buffer overflow/ underflow - Can cause picture loss, skip during audio video playback.

Multiplexing flowchart

Calculation of presentation time of a TS packet: • For video TS packet • For audio TS packet

Video processing

Audio processing

De-multiplexing • The transport stream (TS) input to a receiver is separated into a video elementary stream and audio elementary stream. • These ES are initially written in to video and audio buffers respectively. • Once one of the buffers is full, the elementary stream is reconstructed from the point of synchronization.

Audio- video synchronization • Once video buffer is full, it is searched for the next occurring IDR frame in the video buffer. • Corresponding audio frame is calculated from the equation • Elementary streams are reconstructed from that point. merged in to a container format (using mkv merge), then played back.

Results : Buffer fullness

Test conditions : • Video • H.264 baseline profile • Resolution: 416X240 • GOP: IPPP (IDR forced) • Fps: 24 • Audio • HEAACv2 • ADTS format • Sampling frequency: 24,000Hz

De-multiplexer output

Skew observed

Conclusions • buffer fullness was effectively handled with maximum buffer difference observed was around 20ms of media content • audio-video synchronization was achieved with a maximum skew of 13ms.

Future work • Expand the multiplexing algorithm to multiplex multiple programs • Implement the same multiplexing algorithm for other transport protocols like RTP/IP • Add error correction to TS stream.

References: • [1] MPEG-4: ISO/IEC JTC1/SC29 14496-10: Information technology – Coding of audio-visual objects - Part 10: Advanced Video Coding, ISO/IEC, 2005. • [2] MPEG-4: ISO/IEC JTC1/SC29 14496-3: Information technology — coding of audio-visual objects — Part 3: Audio, AMENDMENT 4: Audio Lossless Coding (ALS), new audio profiles and BSAC extensions • [3] MPEG–2: ISO/IEC JTC1/SC29 13818–7, advanced audio coding, AAC. International Standard IS WG11, 1997. • [4]MPEG-2: ISO/IEC 13818-1 Information technology—generic coding of moving pictures and associated audio—Part 1: Systems, ISO/IEC: 2005. • [5] Soon-kak Kwon et al. “Overview of H.264 / MPEG-4 Part 10 (pp.186-216)”, Special issue on “ • Emerging H.264/AVC video coding standard”, J. Visual Communication and Image Representation, vol. • 17, pp.183-552, April 2006. • [6] A. Puri et al. “Video coding using the H.264/MPEG-4 AVC compression standard”, Signal Processing: • Image Communication, vol.19, pp 793-849, Oct 2004. • [7] MPEG-4 HE-AAC v2 — audio coding for today's digital media world, article in the EBU technical review (01/2006) giving explanations on HE-AAC. Link: http://tech.ebu.ch/docs/techreview/trev_305-moser.pdf • [8]ETSI TS 101 154 “Implementation guidelines for the use of video and audio coding in broadcasting applications based on the MPEG-2 transport stream”. • [9] 3GPP TS 26.401: General Audio Codec audio processing functions; Enhanced aacPlus General Audio Codec; 2009 • [10] 3GPP TS 26.403: EnhancedaacPlusgeneral audio codec; Encoder Specification AAC part. • [11] 3GPP TS 26.404 : EnhancedaacPlusgeneral audio codec; Encoder Specification SBR part. • [12] 3GPP TS 26.405: Enhanced aacPlus general audio codec; Encoder Specification Parametric Stereo part.

[13] http://www.jeroenbreebaart.com/papers/aes/aes116_2.pdf • [14]MPEG Transport Stream. Link: http://www.iptvdictionary.com/iptv_dictionary_MPEG_Transport_Stream_TS_definition.html • [15] MPEG-4: ISO/IEC JTC1/SC29 14496-14 : Information technology — coding of audio-visualobjects — Part 14 :MP4 file format, 2003 • [16] DVB-H : Global mobile TV. Link : http://www.dvb-h.org/ • [17] ATSC-M/H. Link : http://www.atsc.org/cms/ • [18] Open mobile vidéo coalition. Link : http://www.openmobilevideo.com/about-mobile-dtv/standards/ • [19] VC-1 Compressed Video Bitstream Format and Decoding Process(SMPTE 421M-2006), SMPTE Standard, 2006 (http://store.smpte.org/category-s/1.htm). • [20] Henning Schulzrinne's RTP page. Link: http://www.cs.columbia.edu/~hgs/rtp/ • [21] G.A Davidson et al, “ATSC video and audio coding”, Proc. IEEE, vol 94, pp. 60-76, Jan. 2006 (www.atsc.org). • [22] I. E.G.Richardson, “H.264 and MPEG-4 video compression: video coding for next-generation multimedia”,Wiley, 2003. • [23] European Broadcasting Union, http://www.ebu.ch/ • [24] Shintaro Ueda, et, al “NAL level stream authentication for H.264/AVC” , IPSJ Digital courier, Vol 3 , Feb 2007. • [25] World DMB: link: http://www.worlddab.org/ • [26] ISDB website. Link: http://www.dibeg.org/

[27] 3gpp website. Link: http://www.3gpp.org/ • [28] “Audio compression gets better and more complex” MihirModi, link : http://www.eetimes.com/discussion/other/4025543/Audio-compression-gets-better-and-more-complex • [29]”MPEG-2: Overview of systems layer”, by PA Sarginson. Link: http://downloads.bbc.co.uk/rd/pubs/reports/1996-02.pdf • [30] MPEG-2 ISO/IEC 13818-1: GENERIC CODING OF MOVING PICTURES AND AUDIO: part 1- SYSTEMS Amendment 3: Transport of AVC video data over ITU-T Rec H.222.0 |ISO/IEC 13818-1 streams, 2003 • [31] MKV merge software. Link: http://www.matroska.org/ • [32] VLC media player. Link: http://www.videolan.org/ • [33] Gom media player. Link: http://www.gomlab.com/ • [34] H. Murugan, “Multiplexing H264 video bit-stream with AAC audio bit-stream, demultiplexing and achieving lip sync during playback”, M.S.E.E Thesis, University of Texas at Arlington, TX May 2007. • [34] GeroldBlakowski et.al “A Media Synchronization Survey: Reference Model, Specification, and Case Studies”, IEEE Journal on selected areas in communications, VOL. 14, NO. 1, JANUARY 1996 • [35] H.264/AVC JM Software link: http://iphome.hhi.de/suehring/tml/download/. • [36] 3GPP Enhanced aacPlus reference software. Link: http://www.3gpp.org/ftp/ • [37] H.264 bitstream link: http://sosori.com/

Multiplexing H.264 and HEAACv2 elementary streams, de-multiplexing and achieving lip synchronization during playback

Multiplexing H.264 and HEAACv2 elementary streams, de-multiplexing and achieving lip synchronization during playback

Presentation Transcript

MULTIPLEXING OF AVS CHINA PART 2 VIDEO WITH AAC BIT STREAMS AND DE-MULTIPLEXING WITH LIP SYNC DURING PLAYBACK

Multiplexing

Multiplexing

Multiplexing and Demultiplexing

Multiplexing and Inverse Multiplexing

Multiplexing

Multiplexing

Multiplexing

Multiplexing

Multiplexing

Multiplexing

Multiplexing

Multiplexing

Multiplexing

Multiplexing

Multiplexing and Demultiplexing

Multiplexing

Multiplexing

Multiplexing