1 / 83

Speech Processing

Speech Processing. Speech Coding. Speech Coding. Definition: Speech Coding is a process that leads to the representation of analog waveforms with sequences of binary digits .

hernandezd
Download Presentation

Speech Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Processing Speech Coding

  2. Speech Coding • Definition: • Speech Coding is a process that leads to the representation of analog waveforms with sequences of binary digits. • Even though availability of high-bandwidth communication channels has increased, speech coding for bit reduction has retained its importance. • Reduced bit-rates transmissions required for cellular networks • Voice over IP • Coded speech • Is less sensitive than analog signals to transmission noise • Easier to: • protect against (bit) errors • Encrypt • Multiplex, and • Packetize • Typical Scenario depicted in next slide (Figure 12.1) Veton Këpuska

  3. Digital Telephone Communication System Veton Këpuska

  4. Categorization of Speech Coders • Waveform Coders: • Used to quantize speech samples directly and operate at high-bit rates in the range of 16-64 kbps (bps - bits per second) • Hybrid Coders • Are partially waveform coders and partly speech model-based coders and operate in the mid bit rate range of 2.4-16 kbps. • Vocoders • Largely model-based and operate at a low bit rate range of 1.2-4.8 kbps. • Tend to be of lower quality than waveform and hybrid coders. Veton Këpuska

  5. Quality Measurements • Quality of coding can is viewed as the closeness of the processed speech to the original speech or some other desired speech waveform. • Naturalness • Degree of background artifacts • Intelligibility • Speaker identifiability • Etc. Veton Këpuska

  6. Quality Measurements • Subjective Measurement: • Diagnostic Rhyme Test (DRT) measures intelligibility. • Diagnostic Acceptability Measure and Mean Opinion Score (MOS) test provide a more complete quality judgment. • Objective Measurement: • Segmental Signal to Noise Ratio (SNR) – average SNR over a short-time segments • Articulation Index – relies on an average SNR across frequency bands. Veton Këpuska

  7. Quality Measurements • A more complete list and definition of subjective and objective measures can be found at: • J.R. Deller, J.G. Proakis, and J.H.I Hansen, “Discrete-Time Processing of Speech”, Macmillan Publishing Co., New York, NY, 1993 • S.R. Quackenbush, T.P. Barnwell, and M.A. Clements, “Objective Measures of Speech Quality. Prentice Hall, Englewood Cliffs, NJ. 1988 Veton Këpuska

  8. Statistical Models • Speech waveform is viewed as a random process. • Various estimates are important from this statistical perspective: • Probability density • Mean, Variance and autocorrelation • One approach to estimate a probability density function (pdf) of x[n] is through histogram. • Count up the number of occurrences of the value of each speech sample in different ranges:for many speech samples over a long time duration. • Normalize the area of the resulting curve to unity. Veton Këpuska

  9. Statistical Models • The histogram of speech (Davenport, Paez & Glisson) was shown to approximate a gamma density:where x is the standard deviation of the pdf. • Simpler approximation is given by the Laplacian pdf of the form: Veton Këpuska

  10. PDF of Speech Veton Këpuska

  11. Scalar Quantization • Assume that a sequence x[n] was obtained from speech waveform that has been lowpass-filtered and sampled at a suitable rate with infinite amplitude precision. • x[n] samples are quantized to a finite set of amplitudes denoted by . • Associated with the quantizer is a quantization step size. • Quantization allows the amplitudes to be represented by finite set of bit patterns – symbols. • Encoding: • Mapping of to a finite set of symbols. • This mapping yields a sequence of codewords denoted by c[n] (Figure 12.3a). • Decoding – Inverse process whereby transmitted sequence of codewords c’[n] is transformed back to a sequence of quantized samples (Figure 12.3b). Veton Këpuska

  12. Scalar Quantization Veton Këpuska

  13. Fundamentals • Assume a signal amplitude is quantized into M levels. • Quantizer operator is denoted by Q(x); Thus • Where denotes M possible reconstruction levels – quantization levels, and • 1≤i≤M • xi denotes M+1 possible decision levels with 0≤i≤M • If xi-1< x[n] < xi, then x[n] is quantized to the reconstruction level • is quantized sample of x[n]. Veton Këpuska

  14. Fundamentals • Scalar Quantization Example: • Assume there M=4 reconstruction levels. • Amplitude of the input signal x[n] falls in the range of [0,1] • Decision levels and Reconstruction levels are equally spaced: • Decision levels are [0,1/4,1/2,3/4,1] • Reconstruction levels assumed to be [0,1/8,3/8,5/8,7/8] • Figure 12.4 in the next slide. Veton Këpuska

  15. Example of Uniform 2-bit Quantizer Veton Këpuska

  16. Uniform Quantizer • A uniform quantizer is one whose decision and reconstruction levels are uniformly spaced. Specifically: •  is the step size equal to the spacing between two consecutive decision levels which is the same spacing between two consecutive reconstruction levels (Exercise 12.1). • Each reconstruction level is attached a symbol – the codeword. Binary numbers typically used to represent the quantized samples (Figure 12.4). Veton Këpuska

  17. Uniform Quantizer • Codebook: Collection of codewords. • In general with B-bit binary codebook there are 2B different quantization (or reconstruction) levels. • Bit rate is defined as the number of bits B per sample multiplied by sample rate fs: I=Bfs • Decoder inverts the coder operation taking the codeword back to a quantized amplitude value (e.g., 01 → ). • Often the goal of speech coding/decoding is to maintain the bit rate as low as possible while maintaining a required level of quality. • Because sampling rate is fixed for most applications this goal implies that the bit rate be reduced by decreasing the number of bits per sample Veton Këpuska

  18. Uniform Quantizer • Designing a uniform scalar quantizer requires knowledge of the maximum value of the sequence. • Typically the range of the speech signal is expressed in terms of the standard deviation of the signal. • Specifically, it is often assumed that: -4x≤x[n]≤4x where x is signal’s standard deviation. • Under the assumption that speech samples obey Laplacian pdf there are approximately 0.35% of speech samples fall outside of the range: -4x≤x[n]≤4x. • Assume B-bit binary codebook ⇒ 2B. • Maximum signal value xmax = 4x. Veton Këpuska

  19. Uniform Quantizer • For the uniform quantization step size  we get: • Quantization step size  relates directly to the notion of quantization noise. Veton Këpuska

  20. Quantization Noise • Two classes of quantization noise: • Granular Distortion • Overload Distortion • Granular Distortion • x[n] unquantized signal and e[n] is the quantization noise. • For given step size  the magnitude of the quantization noise e[n] can be no greater than /2, that is: • Figure 12.5 depicts this property were: Veton Këpuska

  21. Quantization Noise Veton Këpuska

  22. Quantization Noise • Overload Distortion • Maximum-value constant: • xmax = 4x (4x≤x[n]≤4x) • For Laplacian pdf, 0.35% of the speech samples fall outside the range of the quantizer. • Clipped samples incur a quantization error in excess of /2. • Due to the small number of clipped samples it is common to neglect the infrequent large errors in theoretical calculations. Veton Këpuska

  23. Quantization Noise • Statistical Model of Quantization Noise • Desired approach in analyzing the quantization error in numerous applications. • Quantization error is considered an ergodic white-noise random process. • The autocorrelation function of such a process is expressed as: Veton Këpuska

  24. Quantization Error • Previous expression states that the process is uncorrelated. • Furthermore, it is also assumed that the quantization noise and the input signal are uncorrelated, i.e., • E(x[n]e[n+m])=0,  m. • Final assumption is that the pdf of the quantization noise is uniform over the quantization interval: Veton Këpuska

  25. Quantization Error • Stated assumptions are not always valid. • Consider a slowly varying – linearly varying signal ⇒ then e[n] is also changing linearly and is signal dependent (see Figure 12.5 in the previous slide). • Correlated quantization noise can be annoying. • When quantization step  is small then assumptions for the noise being uncorrelated with itself and the signal are roughly valid when the signal fluctuates rapidly among all quantization levels. • Quantization error approaches a white-noise process with an impulsive autocorrelation and flat spectrum. • One can force e[n] to be white-noise and uncorrelated with x[n] by adding white-noise to x[n] prior to quantization. Veton Këpuska

  26. Quantization Error • Process of adding white noise is known as Dithering. • This decorrelation technique was shown to be useful not only in improving the perceptual quality of the quantization noise but also with image signals. • Signal-to-Noise Ratio • A measure to quantify severity of the quantization noise. • Relates the strength of the signal to the strength of the quantization noise. Veton Këpuska

  27. Quantization Error • SNR is defined as: • Given assumptions for • Quantizer range: 2xmax, and • Quantization interval: = 2xmax/2B, for a B-bit quantizer • Uniform pdf, it can be shown that (see Exercise 12.2): Veton Këpuska

  28. Quantization Error • Thus SNR can be expressed as: • Or in decibels (dB) as: • Because xmax = 4x, then SNR(dB)≈6B-7.2 Veton Këpuska

  29. Quantization Error • Presented quantization scheme is called pulse code modulation (PCM). • B-bits per sample are transmitted as a codeword. • Advantages of this scheme: • It is instantaneous (no coding delay) • Independent of the signal content (voice, music, etc.) • Disadvantages: • It requires minimum of 11 bits per sample to achieve “toll quality” (equivalent to a typical telephone quality) • For 10000 Hz sampling rate, the required bit rate is:B=(11 bits/sample)x(10000 samples/sec)=110,000 bps=110 kbps • For CD quality signal with sample rate of 20000 Hz and 16-bits/sample, SNR(dB) =96-7.2=88.8 dB and bit rate of 320 kbps. Veton Këpuska

  30. Nonuniform Quantization • Uniform quantization may not be optimal (SNR can not be as small as possible for certain number of decision and reconstruction levels) • Consider for example speech signal for which x[n] is much more likely to be in one particular region than in other (low values occurring much more often than the high values). • This implies that decision and reconstruction levels are not being utilized effectively with uniform intervals over xmax. • A Nonuniform quantization that is optimal (in a least-squared error sense) for a particular pdf is referred to as the Max quantizer. • Example of a nonuniform quantizer is given in the figure in the next slide. Veton Këpuska

  31. Nonuniform Quantization Veton Këpuska

  32. Nonuniform Quantization • Max Quantizer • Problem Definition: For a random variable x with a known pdf, find the set of M quantizer levels that minimizes the quantization error. • Therefore, finding the decision and boundary levels xi and xi, respectively, that minimizes the mean-squared error (MSE) distortion measure: D=E[(x-x)2] • E-denotes expected value and x is the quantized version of x. • It turns out that optimal decision level xk is given by: ^ ^ ^ Veton Këpuska

  33. Nonuniform Quantization • Max Quantizer (cont.) • The optimal reconstruction level xk is the centroid of px(x) over the interval xk-1≤ x ≤xk: • It is interpreted as the mean value of x over interval xk-1≤ x ≤xk for the normalized pdf p(x). • Solving last two equations for xk and xk is a nonlinear problem in these two variables. • Iterative solution which requires obtaining pdf (can be difficult). ^ ~ ^ Veton Këpuska

  34. Nonuniform Quantization Veton Këpuska

  35. Companding • Alternative to the nonuniform quantizer is companding. • It is based on the fact that uniform quantizer is optimal for a uniform pdf. • Thus if a nonlinearity is applied to the waveform x[n] to form a new sequence g[n] whose pdf is uniform then • Uniform quantizer can be applied to g[n] to obtain g[n], as depicted in the Figure 12.10 in the next slide. ^ Veton Këpuska

  36. Companding Veton Këpuska

  37. Companding • A number of other nonlinear approximations nonlinear transformation that achieves uniform density are used in practice which do not require pdf measurement. • Specifically and A-law and –law companding. • -law coding is give by: • CCITT international standard coder at 64 kbps is an example application of -law coding. • -law transformation followed by 7-bit uniform quantization giving toll quality speech. • Equivalent quality of straight uniform quantization achieved by 11 bits. Veton Këpuska

  38. Adaptive Coding • Nonuniform quantizers are optimal for a long term pdf of speech signal. • However, considering that speech is a highly-time-varying signal, one has to question if a single pdf derived from a long-time speech waveform is a reasonable assumption. • Changes in the speech waveform: • Temporal and spectral variations due to transitions from unvoiced to voiced speech, • Rapid volume changes. • Approach: • Estimate a short-time pdf derived over 20-40 msec intervals. • Short-time pdf estimates are more accurately described by a Gaussian pdf regardless of the speech class. Veton Këpuska

  39. Adaptive Coding • A pdf derived from a short-time speech segment more accurately represents the speech nonstationarity. • One approach is to assume a pdf of a specific shape in particular a Gaussian with unknown variance 2. • Measure the local variance then adapt a nonuniform quantizer to the resulting local pdf. • This approach is referred to as adaptive quantization. • For a Gaussian we have: Veton Këpuska

  40. Adaptive Coding • Measure the variance x2 of a sequence x[n] and use resulting pdf to design optimal max quantizer. • Note that a change in the variance simply scales the time signal: • If E(x2[n]) = x2 then E[(x[n])2] = 2x2 • Need to design only one nonuniform quantizer with unity variance and scale decision and reconstruction levels according to a particular variance. • Fix the quantizer and apply a time-varying gain to the signal according to the estimated variance (scale the signal to match the quantizer). Veton Këpuska

  41. Adaptive Coding Veton Këpuska

  42. Adaptive Coding • There are two possible approaches for estimation of a time-varying variance 2[n]: • Feed-forward method (shown in Figure 12.11) where the variance (or gain) estimate is obtained from the input • Feedback method where the estimate is obtained from a quantizer output. • Advantage – no need to transmit extra side information (quantized variance) • Disadvantage – additional sensitivity to transmission errors in codewords. • Adaptive quantizers can achieve higher SNR than the use of –law companding. • –law companding is generally preferred for high-rate waveform coding because of its lower background noise when transmission channel is idle. • Adaptive quantization is useful in variety of other coding schemes. Veton Këpuska

  43. Differential and Residual Quantization • Presented methods are examples of instantaneous quantization. • Those approaches do not take advantage of the fact that speech is highly correlated signal: • Short-time (10-15 samples), as well as • Long-time (over a pitch period) • In this section methods that exploit short-time correlation will be investigated. Veton Këpuska

  44. Differential and Residual Quantization • Short-time Correlation: • Neighboring samples are “self-similar”, that is, not changing too rapidly from one another. • Difference of adjacent samples should have a lower variance than the variance of the signal itself. • This difference, thus, would make a more effective use of quantization levels: • Higher SNR for fixed number of quantization levels. • Predicting the next sample from previous ones (finding the best prediction coefficients to yield a minimum mean-squared prediction error  same methodology as in LPC of Chapter 5). Two approaches: • Have a fixed prediction filter to reflect the average local correlation of the signal. • Allow predictor to short-time adapt to the signal’s local correlation. • Requires transmission of quantized prediction coefficients as well as the prediction error. Veton Këpuska

  45. Differential and Residual Quantization • Illustration of a particular error encoding scheme presented in the Figure 12.12 of the next slide. • In this scheme the following sequences are required: • x[n] – prediction of the input sample x[n]; This is the output of the predictor P(z) whose input is a quantized version of the input signal x[n], i.e., x[n] • r[n] – prediction error signal; residual • r[n] – quantized prediction error signal. • This approach is sometimes referred to as residual coding. ~ ^ ^ Veton Këpuska

  46. Differential and Residual Quantization Veton Këpuska

  47. Differential and Residual Quantization • Quantizer in the previous scheme can be of any type: • Fixed • Adaptive • Uniform • Nonuniform • Whatever the case is, the parameter of the quantizer are determined so that to match variance of r[n]. • Differential quantization can also be applied to: • Speech signal • Parameters that represent speech: • LPC – linear prediction coefficients • Cepstral coefficients obtained from Homomorphic filtering. • Sinewave parameters, etc. Veton Këpuska

  48. Differential and Residual Quantization • Consider quantization error of the quantized residual: • From Figure 12.12 we express the quantized input x[n] as: ^ Veton Këpuska

  49. Differential and Residual Quantization • Quantized signal samples differ form the input only by the quantization error er[n]. • Since the er[n] is the quantization error of the residual: ⇒ if the prediction of the signal is accurate then the variance of r[n] will be smaller than the variance of x[n] ⇒ A quantizer with a given number of levels can be adjusted to give a smaller quantization error than would be possible when quantizing the signal directly. Veton Këpuska

  50. Differential and Residual Quantization • The differential coder of Figure 12.12 is referred to: • Differential PCM (DPCM) when used with • a fixed predictor and • fixed quantization. • Adaptive Differential PCM (ADPCM) when used with • Adaptive prediction (i.e., adapting the predictor to local correlation) • Adaptive quantization (i.e., adapting the quantizer to the local variance of r[n]) • ADPCM yields greatest gains in SNR for a fixed bit rate. • The international coding standard CCITT, G.721 with toll quality speech at 32 kbps (8000 samples/sec x 4 bits/sample) has been designed based on ADPCM techniques. • To achieve higher quality with lower rates it is required to: • Rely on speech model-based techniques and • The exploiting of long-time prediction, as well as • Short-time prediction Veton Këpuska

More Related