510 likes | 910 Views
Multiplexing H.264 and HEAACv2 elementary streams, de-multiplexing and achieving lip synchronization during playback. Naveen Siddaraju naveen.siddaraju@mavs.uta.edu. Contents: . Introduction : Need for multiplexing Overview of codecs used Transport protocols Multiplexing
E N D
Multiplexing H.264 and HEAACv2 elementary streams, de-multiplexing and achieving lip synchronization during playback NaveenSiddaraju naveen.siddaraju@mavs.uta.edu
Contents: • Introduction : Need for multiplexing • Overview of codecs used • Transport protocols • Multiplexing • De-multiplexing and synchronization • Results • Conclusions • Future work • References
Introduction: need for multiplexing • Digital television broadcasting • ATSC- M/H [17] • DVB- H • DVB- T • Internet streaming • IPTV, YouTube etc .
Choice of CODECs • Depends on the application. • Transport bandwidth - ATSC-M/H channel bandwidth 19.6Mbps - DVB-H channel bandwidth 14 Mbps • Processing power of the target device
H.264/ AVC • Defined in MPEG4 part 10 • Jointly developed by ITU – T VCEG and MPEG group of ISO/IEC. • Provides better compression than its predecessors like MPEG 2 video and MPEG 4 part 2. • Suitable for a wide variety of applications. • Adopted standard in ATSC-M/H, DVB etc • Used in Blu-ray discs, DVDs, iTunes, flash player, video conferencing applications etc
Frame types • Three basic types • Intra predictive (I) frame • Predictive (P) frame • Bi predictive (B) frame • IDR frame is a special type of I frame. - indicates the start of a video sequence.
Bitstream syntax of H.264 • Data is organized into two layers • VCL (video coding layer) • NAL (network abstraction layer) • NAL formatting of VCL and non-VCL data [6]
Forbidden bit • NRI - 2bits • Type - 5 bits NAL unit format[6]
Important NAL unit types • IDR frames - indicates start a of new video sequence • Sequence parameter sets (SPS) - contains parameters common to entire sequence - profile, level, size of the video, no of reference frames • Picture parameter sets (PPS) - contains parameters that to a frame or some frames in a sequence - entropy coding , quantization parameters etc .
HEAACv2 • Also called enhanced aac plus • Developed by coding technologies for very low bitrate applications . • Defined in MPEG4 part 3 amendment 2 • Enables coding in mono, stereo and multi channels (up to 48 channels ) • Is a combination of AAC, SBR, PS • Provides highest perceptible quality for the lowest bitrate • Adopted as audio standard in ATSC- M/H, DVB, XM satellite radio • Can exist in a variety of file formats like mp4, m4a. • Controlled testing conducted by 3gpp [27] indicates that HEAACv2 provides good quality audio at 24kbps.
AAC (advanced audio codec) • Successor of the MP3 format • Defined both in MPEG2 [3] and MPEG4 [2] • Achieves better sound quality than MP3 for same bitrates. • AAC is also the standard audio format for apple iPhone, iPod, iPad, Sony playstation etc . • Up to 48 channels (MP3 supports up to two channels in MPEG-1 mode and up to 5.1channels in MPEG-2 mode) • More sampling frequencies (from 8 to 96 kHz) than MP3 (16 to 48 kHz) • Achieves good quality audio at 128 kbps for stereo.
SBR (spectral band replication) [2] • SBR is a bandwidth expansion technique • Exploits the correlation between the high and low frequencies. • Using SBR, along with AAC, high quality stereo sound can be achieved at 48 kbps.
High band reconstruction through SBR [28] • Original audio signal [28]. • High band reconstruction through SBR [28].
PS (parametric stereo) [2] • Only used for low bitrate applications ( < 32kbps) • Parameterizes the stereo image such as time/phase differences, interchannel intensity differences etc. • Only monaural version of the stereo is encoded by the AAC encoder. • At the decoder side the monaural signal is decoded first, and then stereo signal is reconstructed using the PS parameters • Using PS along with AAC and SBR , reasonable quality stereo sound can be achieved at 24 kbps.
HEAACv2 bitstream formats • ADIF (audio data interchange format) - has just one header for the whole stream - used in storage media. • ADTS (audio data transport stream) - used in transport stream. - has headers in every access unit.
Transport protocols • Most multimedia applications involve communication channels or storage. • RTP (real time protocol) - transport over IP networks • MPEG2 systems - digital television broadcast - storage (asset management)
MPEG2 systems • Defines two types streams - Program stream (PS) - used for storage , ex. DVD - Transport stream (TS) - used for digital broadcast • Two layers of packetization - PES (packetized elementary streams) - TS (transport stream)
PES (packetized elementary stream) • First layer of packetization • Separates audio video elementary streams into access units. • Variable length • Contains a header and payload (frame) data. • Add fields like time stamp, stream ID, packet length
Frame number as time stamp • For video, fps is a constant through out the sequence. • For audio, sampling frequency is a constant through out the sequence.
TS packets • Second layer of packetization • Fixed length (188 bytes) • PES is logically broken down in to 188 byte packets • Three byte header contains packet ID, payload unit start flag, continuity counter etc.
TS header description: • payload unit start indicator (PUSI) flag - indicates payload has PES header. • Adaptation field control (AFC) flag - indicates payload is less than 185 bytes • Continuity counter (CC) (4 bits) - 4 bit counter, used to check for any packet losses, out of sequences etc . • Packet ID (PID) (10 bits) - uniquely identifies the particular ES , the packet belongs to • Optional offset byte : - contains the offset value is AFC is set.
Multiplexing • What is multiplexing ? • Multiplexing is a process of transmitting TS packets belonging to different elementary streams . • Muxing is a processes of how effectively the TS packets are interleaved in the TS stream , so that both audio and video contents get transmitted simultaneously. • Buffer overflow/ underflow - Can cause picture loss, skip during audio video playback.
Calculation of presentation time of a TS packet: • For video TS packet • For audio TS packet
De-multiplexing • The transport stream (TS) input to a receiver is separated into a video elementary stream and audio elementary stream. • These ES are initially written in to video and audio buffers respectively. • Once one of the buffers is full, the elementary stream is reconstructed from the point of synchronization.
Audio- video synchronization • Once video buffer is full, it is searched for the next occurring IDR frame in the video buffer. • Corresponding audio frame is calculated from the equation • Elementary streams are reconstructed from that point. merged in to a container format (using mkv merge), then played back.
Test conditions : • Video • H.264 baseline profile • Resolution: 416X240 • GOP: IPPP (IDR forced) • Fps: 24 • Audio • HEAACv2 • ADTS format • Sampling frequency: 24,000Hz
Conclusions • buffer fullness was effectively handled with maximum buffer difference observed was around 20ms of media content • audio-video synchronization was achieved with a maximum skew of 13ms.
Future work • Expand the multiplexing algorithm to multiplex multiple programs • Implement the same multiplexing algorithm for other transport protocols like RTP/IP • Add error correction to TS stream.
References: • [1] MPEG-4: ISO/IEC JTC1/SC29 14496-10: Information technology – Coding of audio-visual objects - Part 10: Advanced Video Coding, ISO/IEC, 2005. • [2] MPEG-4: ISO/IEC JTC1/SC29 14496-3: Information technology — coding of audio-visual objects — Part 3: Audio, AMENDMENT 4: Audio Lossless Coding (ALS), new audio profiles and BSAC extensions • [3] MPEG–2: ISO/IEC JTC1/SC29 13818–7, advanced audio coding, AAC. International Standard IS WG11, 1997. • [4]MPEG-2: ISO/IEC 13818-1 Information technology—generic coding of moving pictures and associated audio—Part 1: Systems, ISO/IEC: 2005. • [5] Soon-kak Kwon et al. “Overview of H.264 / MPEG-4 Part 10 (pp.186-216)”, Special issue on “ • Emerging H.264/AVC video coding standard”, J. Visual Communication and Image Representation, vol. • 17, pp.183-552, April 2006. • [6] A. Puri et al. “Video coding using the H.264/MPEG-4 AVC compression standard”, Signal Processing: • Image Communication, vol.19, pp 793-849, Oct 2004. • [7] MPEG-4 HE-AAC v2 — audio coding for today's digital media world, article in the EBU technical review (01/2006) giving explanations on HE-AAC. Link: http://tech.ebu.ch/docs/techreview/trev_305-moser.pdf • [8]ETSI TS 101 154 “Implementation guidelines for the use of video and audio coding in broadcasting applications based on the MPEG-2 transport stream”. • [9] 3GPP TS 26.401: General Audio Codec audio processing functions; Enhanced aacPlus General Audio Codec; 2009 • [10] 3GPP TS 26.403: EnhancedaacPlusgeneral audio codec; Encoder Specification AAC part. • [11] 3GPP TS 26.404 : EnhancedaacPlusgeneral audio codec; Encoder Specification SBR part. • [12] 3GPP TS 26.405: Enhanced aacPlus general audio codec; Encoder Specification Parametric Stereo part.
[13] http://www.jeroenbreebaart.com/papers/aes/aes116_2.pdf • [14]MPEG Transport Stream. Link: http://www.iptvdictionary.com/iptv_dictionary_MPEG_Transport_Stream_TS_definition.html • [15] MPEG-4: ISO/IEC JTC1/SC29 14496-14 : Information technology — coding of audio-visualobjects — Part 14 :MP4 file format, 2003 • [16] DVB-H : Global mobile TV. Link : http://www.dvb-h.org/ • [17] ATSC-M/H. Link : http://www.atsc.org/cms/ • [18] Open mobile vidéo coalition. Link : http://www.openmobilevideo.com/about-mobile-dtv/standards/ • [19] VC-1 Compressed Video Bitstream Format and Decoding Process(SMPTE 421M-2006), SMPTE Standard, 2006 (http://store.smpte.org/category-s/1.htm). • [20] Henning Schulzrinne's RTP page. Link: http://www.cs.columbia.edu/~hgs/rtp/ • [21] G.A Davidson et al, “ATSC video and audio coding”, Proc. IEEE, vol 94, pp. 60-76, Jan. 2006 (www.atsc.org). • [22] I. E.G.Richardson, “H.264 and MPEG-4 video compression: video coding for next-generation multimedia”,Wiley, 2003. • [23] European Broadcasting Union, http://www.ebu.ch/ • [24] Shintaro Ueda, et, al “NAL level stream authentication for H.264/AVC” , IPSJ Digital courier, Vol 3 , Feb 2007. • [25] World DMB: link: http://www.worlddab.org/ • [26] ISDB website. Link: http://www.dibeg.org/
[27] 3gpp website. Link: http://www.3gpp.org/ • [28] “Audio compression gets better and more complex” MihirModi, link : http://www.eetimes.com/discussion/other/4025543/Audio-compression-gets-better-and-more-complex • [29]”MPEG-2: Overview of systems layer”, by PA Sarginson. Link: http://downloads.bbc.co.uk/rd/pubs/reports/1996-02.pdf • [30] MPEG-2 ISO/IEC 13818-1: GENERIC CODING OF MOVING PICTURES AND AUDIO: part 1- SYSTEMS Amendment 3: Transport of AVC video data over ITU-T Rec H.222.0 |ISO/IEC 13818-1 streams, 2003 • [31] MKV merge software. Link: http://www.matroska.org/ • [32] VLC media player. Link: http://www.videolan.org/ • [33] Gom media player. Link: http://www.gomlab.com/ • [34] H. Murugan, “Multiplexing H264 video bit-stream with AAC audio bit-stream, demultiplexing and achieving lip sync during playback”, M.S.E.E Thesis, University of Texas at Arlington, TX May 2007. • [34] GeroldBlakowski et.al “A Media Synchronization Survey: Reference Model, Specification, and Case Studies”, IEEE Journal on selected areas in communications, VOL. 14, NO. 1, JANUARY 1996 • [35] H.264/AVC JM Software link: http://iphome.hhi.de/suehring/tml/download/. • [36] 3GPP Enhanced aacPlus reference software. Link: http://www.3gpp.org/ftp/ • [37] H.264 bitstream link: http://sosori.com/