370 likes | 532 Views
Waveform SpeechCoding Algorithms: An Overview. June 20 th , 2012 Adel Zaalouk. Outline. Introduction Concepts Quantization PCM DPCM ADPCM Standards & Applications G711 G726 Performance Comparison & Examples Summary & Conclusion.
E N D
Waveform SpeechCoding Algorithms: An Overview June 20th, 2012 Adel Zaalouk
Outline • Introduction • Concepts • Quantization • PCM • DPCM • ADPCM • Standards& Applications • G711 • G726 • Performance Comparison & Examples • Summary & Conclusion Technical Presentation Page 2
Introduction Motivation What is Speech Coding ? It is the procedure of representing a digitized speech signal as efficiently as possible, while maintaining a reasonable level of speech quality. Why would we want to do that ? To Answer this, let’s have a look at the Structure of the Coding System Our Guy Technical Presentation Page 3
Introduction Motivation Filtering & Sampling (1) Technical Presentation Page 4
Introduction Motivation Filtering & Sampling (2) Technical Presentation Page 5
Introduction Motivation Filtering & Sampling (3) Technical Presentation Page 6
Introduction Motivation Filtering & Sampling (4) • Most of the speech contents lies in between 300 – 3400 Hz • According to Nyquist theorem Fs >= 2 fm (to avoid aliasing) • A value of 8kHz is selected (8 >= 2*3.4). • For good quality16 bits are used to represent each sample. • Bit-rate = 8kHz *16 bits = 128 kbps Input Rate • The Input rate could even be more, for example in Skype: 16 kHz sampling frequency is used in skype and so resulting to an input rate of 192 kBit/s. But, this is a waste of bandwidth that could rather be used by other services and applications. Source Coding (Speech Coding in this Context) [1] Technical Presentation Page 7
Introduction Motivation Desirable Properties of a Speech Coder • Low Bit-Rate: By using a lower bit-rate, a smaller bandwidth for transmission is • needed , leaving room for other services and applications . • High Speech Quality: Speech quality is the rival of “low bit-rate”. It is important for the • decoded speech quality to be acceptable for the target application. • Low Coding Delays: The process of speech coding introduce extra delay, this might • affect application that have real time requirements. [1] Technical Presentation Page 8
Introduction Speech Coding Categories What are the different Categories of speech coding ? • Speech coding is divided into three different categories: • Waveform Codecs (PCM, DM, APCM, DPCM, ADPCM) • Vocoders (LPC, Homo-morphic, …etc ) • Hybrid codecs (CELP, SELP, RELP, APC, SBC, … etc) [2] Technical Presentation Page 9
Concepts Quantization What Is Quantization ? Quantization is the process of transforming the sample amplitude of a message into a discrete amplitude from a finite set of possible amplitudes. [3] Each sampled value is approximated with a quantized pulse, the approximation will result in an error no larger than q/2 in the positive direction or –q/2 in the negative direction. Technical Presentation Page 10
Concepts Quantization Understanding Quantization To understand quantization a bit more let’s have a look at the following Example: Technical Presentation Page 11
Concepts Quantization Classification Of Quantization Process • The Quantization process is classified as follows: • Uniform Quantization: The representation levels are equally spaced (Uniformly spaced) • Midtread type • Midrise type • Non-Uniform Quantization: The representation levels have variable spacing from one • another . [4] But why do we need such classification ?! Technical Presentation Page 12
Concepts Quantization Human Speech – Excursion & Recap (1) • Speech can broken into two different categories: • Voiced (zzzzz) • Un-Voiced (sssss) • Naturally occurring speech signals are composed of a combination of the above categories, take the word “Goat” for example: [4] Goat contains two voiced signals followed by a partial closure of the vocal tract and then an Un-voiced signal. Those occurs at 3400-3900, 3900-5400, and 6300-6900, respectively. Technical Presentation Page 13
Concepts Quantization - why do we need such classification ?! (1) Human Speech – Excursion & Recap (2) • It should be noted that: • The peak-to-peak amplitude of voiced signals is approximately ten times that of un-voiced • signal. • Un-voiced signals contain more information, and thus higher entropy than voiced signals. • The telephone system must provide higher resolution for lower amplitude signals • Statistics of Speech Signals : Probability of occurrence [6] [3] Amplitude of speech signals Technical Presentation Page 14
Concepts Quantization - why do we need such classification ?! - (2) Quantization Noise • The Quantization process is lossy (errorneous). • An error defined as the difference between the input signal M and the output signal V. This • error E is called the Quantization Noise. • Consider the simple example: • M = (3.117, 4.56, 2.31, 7.82, 1) • V = (3,3,2,7,2) • E = M – V = (0.117 ,1.561, 0.31, 0.89, 1) • How do we calculate the noise power ? • Consider an input m of continuous amplitude of the range (-M_max, M_max) • Assume a uniform Quantizer, how do we get the Quantization Noise Power 1 Technical Presentation Page 15
Concepts Quantization - why do we need such classification ?! - (3) Comparison – Uniform Vs. Non-Uniform Usage • Speech signals doesn’t require high quantization resolution for • high amplitudes (50% Vs. 15%). • wasteful to use uniform quantizer ? • The goal is decrease the SQNR, more levels for low amplitudes, less levels for high ones. • Maybe use a Non-uniform quantizer ? [3] Technical Presentation Page 16
Concepts Quantization More About Non-Uniform Quantizers (Companding) • Uniform quantizer = use more levels when you need it. • The human ear follows a logarithmic process in which high amplitude sound doesn’t • require the same resolution as low amplitude sounds. • One way to achieve non-uniform quantization is to use what is called as “Companding” • Companding = “Compression + Expanding” Uniform Quantization Compressor Function Expander Function (-1) Technical Presentation Page 17
Concepts Quantization What is the purpose of a Compander ? • The purpose of a compander is to equalize the histogram of speech signals so that the • reconstruction levels tend to be equally used. [6] [6] • There are two famous companding techniques that Follow the • Encoding law • A-Law Companding • µ-Law Companding 2 Technical Presentation Page 18
Concepts Quantization A-Law Encoding µ-Law Encoding [3] Technical Presentation Page 19
Concepts Quantization Companding Approximation • Logarithmic functions are slow to compute, why not approximate ? • 3 bits, 8 segments ( chords ) to approximate • P is the sign bit of the output • S’s are the segment code • Q’s are the quantization codes [3] Technical Presentation Page 20
Concepts Quantization Companding Approximation – Algorithm • Encoding • Add a bias of 33 to the absolute value of the input sample • Determine the bit position of the most significant among bits 5 to 12 of the input • Subtract 5 from that position, and this is the Segment code • Finally, the 4 bit quantization code is set to 4 bits after the bit position of the most • significant among bits 5 to 12 • Decoding • Multiply the quantization code by 2 and add 33 the bias to the result • Multiply to the result by 2 raised to the power of the segment code • Decrement the result by the bias • Use P – bit to determine the sign of the result • Example ?! [3] Technical Presentation Page 21
Concepts Quantization µ-Law Encoding - Example • Example Input - 656 P S2 S1 S3 Q3 Q4 Q5 Q6 • Sample is negative so bit P becomes 1 • Add 33 to the absolute value to bias high input values (due to wrapping) • The result after adding is 689 = 0001-0101-10001 • The most-significant 1 bit in position 5 to 12 is at position 9 • Subtracting 5 from the position values yields 4 The segment code • Finally the 4 bits after the last position are inserted as the quantization code Technical Presentation Page 22
Concepts Quantization µ-Law Decoding - Example • Example Input - 656 P S2 S1 S3 Q3 Q4 Q5 Q6 • The quantization code is 101 = 5, so 5*2 +33 =43 • The segment code is 100 = 4 , so 43* 2^4 = 688. • Decrement the Bias 688 -33 =655 • But P is 1 so the final result is -655 • Quantization Noise is 1 (Very small) Technical Presentation Page 23
Concepts Quantization µ-Law Encoding • Approximately linear for smaller values & Logarithmic for high input values • The practically used values for µ is 255 • Used for speech signals • Used for PCM telephone systems in US, Canada and Japan A-Law Encoding • Linear segments for low level inputs & a logarithmic segment for high level inputs • The practically used values for A is 100 • Used for PCM telephone system in Europe Technical Presentation Page 24
Concepts Pulse Code Modulation (PCM) PCM Description • Sampling results in PAM • PCM uniformly quantizes PAM • The result of PCM are PCM words • Each PCM word is l= Log2 (L) bits [3] Technical Presentation Page 25
Concepts Differential Pulse Code Modulation (DPCM) DPCM Description • Signals that are sampled at a high rate have high correlation. • The difference between those samples will not be large • Instead of quantizing each sample, why not quantize the difference ? • This will result in a quantizer with much less number of bits [7] [7] • This is a simple form where (First Order) • More than one signal can be used in the prediction (N-Order) • Problems with this approach ? Technical Presentation Page 26
Concepts Differential Pulse Code Modulation (DPCM) DPCM Example [7] • It is clear here from the table that the error adds up to produce an output signal which is • completely different from the original one Technical Presentation Page 27
Concepts Differential Pulse Code Modulation (DPCM) DPCM Prediction • Previously, input to predictor in the encoder was different than the one in the decoder. • The difference between the predictor led to reconstruction error e(n) = x[n] – x’[n]. • To solve this problem completely the same predictor that was used in the decoder will also • be used in the decoder • Therefore the reconstruction error at the decoder output will be the same as the • quantization error at the encoder. • There will be no quantization accumulation. Channel Technical Presentation Page 28
Concepts Adaptive Differential Pulse Code Modulation (ADPCM) ADPCM Description • As can be inferred from the name, ADPCM combines PCM + DPCM and adds the ADPCM • The “A” in ADPCM stands for “Adaptive” • In DPCM, the difference between x[k] and x[k-1] is transmitted instead of x[k] • To further reduce the number of bits per sample, ADPCM adapts the quantization levels to • the characteristics of the analog signal . Original 32-Kbps ADPCM used 4 bits [9] Technical Presentation Page 29
Standards, Examples & Applications G711 G711 Description • A Wave form codec that was Released in 1972 • Formal name is Pulse Code Modulation (PCM) since it uses PCM in it’s encoding • G711 achieves 64 kbps bit rate (8 kHz sampling frequency x 8 bits per sample) • G711 defines two main compression algorithms • A-Law (Used in North America & Japan) • µ-Law (Used in Europe and the rest of the world) • A and µ laws takes as an input 14-bit and 13-bit signed linear PCM samples and Compress • them to 8-bit samples • Applications • Public Switching Telephone Network (PSTN) • WiFi phones VoWLAN • Wideband IP Telephony • Audio & Video Conferencing • H.320 & H.323 specifications Technical Presentation Page 30
Standards, Examples & Applications G726 G726 Description • G726 makes a conversion of a 64 kbps A-law or µ-law PCM channel to and from a 40, 32, 24 • or 16 kbps channel. • The conversion is applied to raw PCM using the ADPCM Encoding Technique • Different rates are achieved by adapting the number of quantization levels • 4 - levels (2 bits and 16 kbps) • 7 - levels (3 bits and 24 kbps) • 15 - levels (4 bits and 32 kbps) • 31 - levels (5 bits and 64 kbps) • Includes G721 and G723 [12] Technical Presentation Page 31
Performance Comparison [1] Technical Presentation Page 32
Summary & Conclusion Summary & Conclusion Summary • We talked about quantization concepts in all it’s flavors • We discussed about the category of waveform coding (PCM,DPCM and ADPCM) • We presented the ITU Standards (G711 and G726) and mentioned some examples and • applications • Finally we did a comparison the most prominent speech codec's out there. Conclusion • Speech coding Is an important concept that is required to efficiently use the existing • bandwidth • There exist many important metrics to keep in mind when doing speech coding. It is I • important for a good speech coder to balance those metrics. The Most important ones are • Data Rate • Speech Quality • Delay • Waveform codec's, achieves the best speech quality as well as low delays. • Vocoders achieves low data rate but at the cost of delays and speech quality • Hybrid coders achieves acceptable speech quality and acceptable delay and data rate. Technical Presentation Page 33
References Wai C. Chu Speech Coding Algorithms: Foundation & Evolution of Standardized Coders Speech Coding: http://www-mobile.ecs.soton.ac.uk/speech_codecs/ Sklar: Digital Communication Fundamentals And Applications. A-Law and mu-Law Companding Implementations Using the TMS320C54x Michael Langer: Data Compression – Introduction to lossy compression Signal Quantization and Compression Overview http://www.ee.ucla.edu/~dsplab/sqc/over.html Wajih Abu-Al-Saud: Ch. VI Sampling & Pulse Code Mod. Lecture 25 Yuli You: Audio Coding: Theory And Applications Tarmo Anttalainen: Introduction to telecommunication Networks Engineering Wikipedia G711: http://en.wikipedia.org/wiki/G.711 David Salomon: Data Communication the Complete Reference ITU CCIT Recommendation G.726 ADPCM Technical Presentation Page 34
Questions & Discussion Thank you!! Technical Presentation Page 35