340 likes | 496 Views
EE 5359 Multimedia Processing Project Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio codec) and HE-AAC (high efficiency-advanced audio codec ). Name: Yashas Prakash Student ID :1000803680
E N D
EE 5359 Multimedia Processing ProjectStudy and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio codec) and HE-AAC (high efficiency-advanced audio codec) Name: YashasPrakash Student ID :1000803680 Instructor: Dr. K. R. Rao Date: 05-09-2012
Introduction • Audio coding starts with the conversion of analog signals such as speech and music to digital form. • This digital signal is processed by a digital signal processor according to the requirements to obtain a encoded signal. • The digital signal is encoded and then decoded back to get the original analog signal. • In this project the encoding schemes G.719-A speech codec and AAC and HEAAC audio codecs are studied comprehensively. • The topologies used to encode a signal depends application of the codec. G.719 is mainly used in telecommunication industry to encode speech signals, whereas AAC and HEAAC are used extensively in music and entertainment industry.
Factors to be considered for audio coding [10] • Fidelity • Data rate • Complexity • Delay • Fidelity: It can be defined as how perceptually equivalent the output of the codec is when compared to the input signal. Higher fidelity usually requires higher data rates, greater system complexity and higher system delays. • Data rate: It is linked to the throughput, the storage and the bandwidth capacity of the overall system. Higher data rates typically imply higher costs in transmission and storage of digital audio signals. • Complexity: It is attributed to carrying out the encoding and decoding processes in the system which in turn translates into hardware and software costs of the encoder and decoder. • Delay: It is encoding and decoding processing time by the system. It becomes a major factor in telecomm industry. Scalability, meaning internet broadcasts for users using different internet speeds. Error Robustness, in wireless transmission it is desirable to be able to transmit and receive signals with minimum error.
Introduction to G.719 codec • In June 2006, Polycom proposed to ITU-T the characteristics for a new 20 kHz bandwidth audio codec based on the Polycom Siren 22 technology [12]. • ITU-T on June 2008 standardized G.719 that provided audio coding for full human auditory bandwidth which could drive several new applications. • G.719 operates at bit rates from 32 kbps to 128 kbps per channel • G.719 has extremely low computational loads which makes it feasible to be implemented on digital signal processors used in telephones.
Working of the G.719 encoder • Once the analog signal is converted into digital signal it is fed into the G.719 encoder to compress the signal so that the audio information can be transported over the network. • The signal is sampled at 48 kHz because with a Nyquist limit of 24 kHz, this provides plenty of bandwidth headroom. In addition, it fits well with 20 ms frames and is an integral multiple of both 8 kHz and 16 kHz rates which are commonly used in communications. • Discrete cosine transform (DCT), converts the signal from the time domain into the frequency domain. Transforms produce high coding gain (great compression efficiency) for speech because speech concentrates its energy in relatively few frequencies. • The coefficients generated by the DCT are sent to the spectrum normalization function, which divides the spectrum into multiple frequency bands—that is, low, medium, high, and very high frequencies—and finds the average energy level (norm) for each.
Working of G.719 Encoder • The norm quantization and coding algorithm [4] assigns values to each norm. To increase efficiency, the algorithm encodes the difference between the norm in one frequency band and the norm of the next frequency band; this results in smaller values than if the values themselves were encoded, leading to transmitting less information over the network. • The output (encoded norms) is used by the Spectrum Normalization algorithm to normalize the spectrum coefficients coming from the Transform. • Bit allocation is an algorithm that allocates more bits to encode larger values and fewer bits for encoding smaller values. This is also done to optimize the use of bits by G.719. • The FLVQ fast lattice vector quantization algorithm is a key element in reducing the complexity and memory footprint of G.719. • Noise level adjustment is to apply lower frequency coefficients to higher frequency ranges.
Working of the G.719 decoder • The G.719 decoding algorithm is the logical consequence of the encoder described above lattice decoding is nothing but reverse FLVQ followed by spectral fill. • The co-efficients are then fed into spectral shaping. Because the signal coming out of lattice decoding and the spectral fill generator has the same energy in all frequencies. We have to apply the norms, i.e., reverse the spectrum normalization described earlier, to arrive at the correct energy levels in each frequency band. • As the final step, the inverse transform function converts the signal from the frequency domain back into the time domain. It is, in effect, an inverse DCT.
TRANSIENT DETECTION AND ADAPTIVE TIME FREQUENCY TRANSFORM [4]
ATFT and Transient detection • A "transient" is any rapid change in audio signal energy. G.719 achieves both accuracy and efficiency with its transient detection function that identifies such sounds, coupled with adaptive time-frequency transform (ATFT) function that instantly switches the time resolution between "transient" and "non transient" (or normal) modes. • The G.719's DCT normally uses a 20 ms window which delivers great quality for normal sounds, However, transient sounds that occur over shorter times lead to quantization errors, which sound like brief puffs of noise. To prevent this, the ATFT algorithm tracks these transients by switching to a much shorter 5 ms window when instructed to do so by the transient detector. • Post windowing is used in order to divide the 20 ms time aliased frames into 4 smaller (5 ms) frames followed by DCT. • Combining the 3 operations (post windowing, time aliasing, and DCT) amounts to a DCTIV on the segmented "time aliased signal" before the switch. The four outputs (Y0-Y3, in previous block diagram) are encoded jointly, and the spectrums are interleaved before the next step (FLVQ). • Scalar quantization represents single individual values whereas the Vector Quantizer represents arrays of them, i.e. vectors. A vector quantizer uses a codebook; the larger the codebook the more precise the quantization, although at a cost of higher complexity.
Comparing G.719 with other codecs such as mp3 and AAC [3] • G.719 inherits Polycom’s siren 22 with a mere one million instructions lesser complexity and is therefore a great choice to be implemented in telephones and mobile devices. • The stereo coding capability of G.719 is increased to 256kpbs compared to siren22 128kbps. • MPEG-1 and MPEG-4 [4] require more powerful and expensive digital signal processors. • G.719 uses adaptive time frequency transform which is of great benefit in percussive music sounds therefore it is expected to perform better than siren 22.
Implementation of the G.719 codec • Use a C-compiler such as DevC++ or Visual C++ to compile the code. Any C-compiler can be used to generate the executable files. • The encoder code is executed to get encoder.exe file which is used for encoding the input test_vectors of 32,48 and 64kbps. • The decoder code is executed to get decoder.exe file which is used to decode the encoded test_vectors which are of 32, 48 and 64kbps respectively. • The encoded and the decoded files are compared with each other in the console to check if decoded file and the original test_vector are the same. i/p files of 32k, 48k, 64k .raw format G719 encoder.exe o/p files 32k, 48k, 64k .bs format i/p files 32k, 48k, 64k .bs format G719 decoder.exe o/p files 32k, 48k, 64k .raw format
Comparing the decoded file with the default test_vector at the same bit rate.
Introduction to AAC • AAC [9] was developed with the cooperation and contributions of companies including AT&T Bell laboratories, Fraunhofer IIS, Dolby laboratories, Sony Corporation and Nokia. It was officially declared an international standard by the Moving Picture Experts Group in April 1997. It is specified both as Part 7 of the MPEG-2 standard, and Subpart 4 in Part 3 of the MPEG-4 standard. • Advanced audio codec is a lossy compression encoding and decoding for digital audio. It is the improved version of MP3 format which can achieve better sound quality for similar bitrates in mp3. • It is the default audio format for Apple products, Nintendo wii, Playstation 3.
Features of AAC • More sampling frequencies (from 8 to 96 kHz) than MP3 (16 to 48 kHz) • Up to 48 channels (MP3 supports up to two channels in MPEG-1 mode and up to 5.1 channels in MPEG-2 mode) • Arbitrary bit-rates and variable frame length. Standardized constant bit rate with bit reservoir. • Higher efficiency and simpler filterbank(rather than MP3's hybrid coding, AAC uses a pure MDCT) • Higher coding efficiency for stationary signals (AAC uses a blocksize of 1024 or 960 samples, allowing more efficient coding than MP3's 576 sample blocks) • Higher coding accuracy for transient signals (AAC uses a blocksize of 128 or 120 samples, allowing more accurate coding than MP3's 192 sample blocks) • Can use Kaiser-Bessel derived window function to eliminate spectral leakage at the expense of widening the main lobe • Much better handling of audio frequencies above 16 kHz • More flexible joint stereo (different methods can be used in different frequency ranges)
Working of the AAC encoder • Filterbank and block switching: MDCT (modified discrete cosine transforms) is the standard transform used to convert the incoming audio signal from time domain to frequency domain. • Filterbank and gain control: A gain control module and a processing block containing an uniformly spaced PQF (4-band polyphasequadrature filter) precedes the MDCT. • Temporal noise shaping (TNS): Speech signals that vary with time are often a challenge to conventional transform schemes owing to the fact that quantization noise is controlled over frequency but is constant in a transform block. The TNS technique was introduced into MPEG-2 AAC to overcome this limitation. • Long term prediction (LTP): Redundancy reduction of stationary signal segments can be improved by frequency domain prediction. Stationary signals are supported in long transform blocks and not in short blocks.
Working of the AAC encoder • Intensity stereo: Intensity stereo coding is based on an analysis of high-frequency audio perception specifically on the energy-time envelope of the region of the audio spectrum. This allows a stereo channel pair to share a single set of spectral values for the high-frequency components while preserving the sound quality. • Prediction: The prediction module is used to represent stationary or semi-stationary parts of an audio signal and the repeated information for sequential windows can be represented by a repeat instruction. • Mid/Side (M/S) stereo coding: M/S stereo coding is another data reduction module based on channel pair coding and is used to increase coding efficiency. Thus checking on the redundancy of the signal. • Quantization and coding: Majority of the data reduction generally occurs in the quantization phase after the data has already achieved certain level of compression when passed through the previous modules.
Introduction to HEAAC [12] • High efficiency AAC (HE-AAC) is an extension of AAC-LC optimized for low-bitrate applications such as streaming audio and podcasts. • HE-AAC uses spectral band replication (SBR) to enhance the compression efficiency of the upper half of the frequency band. HE-AAC version 2 (HE-AAC v2) adds Parametric Stereo (PS) to further enhance the compression efficiency of stereo signals at very low bit rates only. • The sound quality for HE-AAC v1 at 64 kb/s is comparable to AAC-LC at 96 kb/s.
AAC+SBR=HEAAC/AACPlus [12] • This is a combination of AAC with SBR where AAC is the audio codec and SBR is a technique which increases the coding gain by bandwidth extension technique
Spectral band replication [7] • Spectral band replication (SBR) is a new audio coding tool that significantly improves the coding gain of perceptual coders and speech coders. • This scheme uses the fact that harmonic series in the higher band of frequencies is the same as that in the lower band. • The higher frequencies are reconstructed using the lower frequency components.
Sound files 32kbps AAC 208kb 64kbps AAC 225kb 32kbps HEAAC 114kb 64kbps HEAAC 223kb 48kbps AAC 210kb 48kbps HEAAC 169kb
Performance analysis using MUSHRA test [20] • This test is done to assess the quality of the audio compression algorithm. Multiple stimuli with hidden reference and anchor (MUSHRA) [20] defined by international telecommunication union (ITU) is a methodology employed for subjective evaluation of audio quality. • It is used to evaluate the perceived quality of the output from lossy audio compression algorithms. The MUSHRA [20] methodology is recommended for assessing "intermediate audio quality". This method requires fewer participants to obtain statistically significant results owing to the fact that all codecs are presented at the same time, on the same samples, so that a paired t-test can be used for statistical analysis. • In MUSHRA, the listener is presented with the reference (labeled as such), a certain number of test samples, a hidden version of the reference and one or more anchors. The recommendation specifies that one anchor must be a 3.5 kHz low-pass version of the reference. The purpose of the anchor(s) is to make the scale be closer to an "absolute scale", making sure that minor artifacts are not rated as having very bad quality.
References • [1] M. Xie, P. Chu, A. Taleb and M. Briand, " A new low-complexity full band (20kHz) audio coding standard for high-quality conversational applications ", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.265-268, Oct. 2009. • [2] A. Taleb and S. Karapetkov, " The first ITU-T standard for high-quality conversational fullband audio coding ", IEEE communications magazine, vol.47, pp.124-130, Oct. 2009. • [3] J. Wang, B. Chen, H. He, S. Zhao and J. Kuang, " An adaptive window switching method for ITU-T G.719 transient coding in TDA domain", IEEE International Conference on Wireless, Mobile and Multimedia Networks, pp.298-301, Jan. 2011. • [4] J. Wang, N. Ning, X. Ji and J. Kuang, " Norm adjustment with segmental weighted SMR for ITU-T G.719 audio codec ", IEEE International Conference on Multimedia and Signal Processing, vol.2, pp.282-285, May. 2011.
References • [5] K. Brandenburg and M. Bosi, “ Overview of MPEG audio: current and future standards for low-bit-rate audio coding ” JAES, vol.45, pp.4-21, Jan. / Feb. 1997. • [6] A/52 B ATSC Digital Audio Compression Standard: http://www.atsc.org/cms/standards/a_52b.pdf • [7] F. Henn , R. Böhm and S. Meltzer, “ Spectral band replication technology and its application in broadcasting ”, International broadcasting convention, 2003. • [8] M. Dietz and S. Meltzer, “CT-AACPlus – a state of the art audio coding scheme”, Coding Tecnologies, EBU Technical review, July. 2002. • [9] ISO/IEC IS 13818-7, “ Information technology – Generic coding of moving pictures and associated audio information Part 7: advanced audio coding (AAC) ”, Jan. 2006.
References • [10] M. Bosi and R. E. Goldberg, “ Introduction to digital audio coding standards ”, Norwell, MA, Kluwer, 2003. • [11] H. S. Malvar, “ Signal processing with lapped transforms ”, Artech House, Norwood, MA, 1992. • [12] D. Meares, K. Watanabe and E. Scheirer, “ Report on the MPEG-2 AAC stereo verification tests ”, ISO/IEC JTC1/SC29/WG11, Feb. 1998. • [13] Super (c) v.2012.build.50: A simplified universal player encoder and renderer, A graphic user interface to FFmpeg, Mencoder, Mplayer, x264, Musepack, Shorten audio, True audio, Wavpack, Libavcodec library and Theora/vorbis real producers plugin: www.erightsoft.com • [14] T. Ogunfunmi and M. Narasimha, “ Principles of speech coding ”, Boca Raton, FL: CRC Press, 2010. • [15] P. Ekstrand, " Bandwidth extension of audio signals by spectral band replication ", IEEE Workshop on model based processing and coding of audio, pp.53-58, Nov. 2002. • [16] T. Johnson, " Stereo coding for ITU-T G.719 codec ", Master of science, Thesis, Uppsala university, Sweden, May 2011.
References • [17] T.Tsai, C. Liu and Y. Wang, "A pure-ASIC design approach for MPEG-2 AAC audio decoder ", Information, Communications and Signal Processing, pp1633-1636, vol. 3, dec. 2003 • [18] Proceedings of the IEEE special issue on, “ Frontiers of audio visual communications convergence of broadband computing and rich media “, vol 100, number 4, Apr. 2012. • [19] Internet references: • http://www.itu.int/rec/T-REC-G.719-200806-I/en • http://www.audiocoding.com/ • http://www.polycom.com/index.html?ss=false • http://en.wikipedia.org/wiki/MUSHRA