1 / 107

Speech Processing

Speech Processing. Speech Coding. Speech Coding. Definition: Speech Coding is a process that leads to the representation of analog waveforms with sequences of binary digits .

kaelem
Download Presentation

Speech Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Processing Speech Coding

  2. Speech Coding • Definition: • Speech Coding is a process that leads to the representation of analog waveforms with sequences of binary digits. • Even though availability of high-bandwidth communication channels has increased, speech coding for bit reduction has retained its importance. • Reduced bit-rates transmissions required for cellular networks • Voice over IP • Coded speech • Is less sensitive than analog signals to transmission noise • Easier to: • protect against (bit) errors • Encrypt • Multiplex, and • Packetize • Typical Scenario depicted in next slide (Figure 12.1) Veton Këpuska

  3. Digital Telephone Communication System Veton Këpuska

  4. Categorization of Speech Coders • Waveform Coders: • Used to quantize speech samples directly and operate at high-bit rates in the range of 16-64 kbps (bps - bits per second) • Hybrid Coders • Are partially waveform coders and partly speech model-based coders and operate in the mid bit rate range of 2.4-16 kbps. • Vocoders • Largely model-based and operate at a low bit rate range of 1.2-4.8 kbps. • Tend to be of lower quality than waveform and hybrid coders. Veton Këpuska

  5. Quality Measurements • Quality of coding can is viewed as the closeness of the processed speech to the original speech or some other desired speech waveform. • Naturalness • Degree of background artifacts • Intelligibility • Speaker identifiability • Etc. Veton Këpuska

  6. Quality Measurements • Subjective Measurement: • Diagnostic Rhyme Test (DRT) measures intelligibility. • Diagnostic Acceptability Measure and Mean Opinion Score (MOS) test provide a more complete quality judgment. • Objective Measurement: • Segmental Signal to Noise Ratio (SNR) – average SNR over a short-time segments • Articulation Index – relies on an average SNR across frequency bands. Veton Këpuska

  7. Quality Measurements • A more complete list and definition of subjective and objective measures can be found at: • J.R. Deller, J.G. Proakis, and J.H.I Hansen, “Discrete-Time Processing of Speech”, Macmillan Publishing Co., New York, NY, 1993 • S.R. Quackenbush, T.P. Barnwell, and M.A. Clements, “Objective Measures of Speech Quality. Prentice Hall, Englewood Cliffs, NJ. 1988 Veton Këpuska

  8. Statistical Models

  9. Statistical Models • Speech waveform is viewed as a random process. • Various estimates are important from this statistical perspective: • Probability Density • Mean, Variance and Autocorrelation • One approach to estimate a probability density function (pdf) of x[n] is through histogram. • Count up the number of occurrences of the value of each speech sample in different ranges:for many speech samples over a long time duration. • Normalize the area of the resulting curve to unity. Veton Këpuska

  10. Statistical Models • The histogram of speech (Davenport, Paez & Glisson) was shown to approximate a gamma density:where x is the standard deviation of the pdf. • Simpler approximation is given by the Laplacian pdf of the form: Veton Këpuska

  11. PDF of Speech Veton Këpuska

  12. PDF of Modeled Speech Veton Këpuska

  13. PDF of Speech • Knowing pdf of speech is essential in order to optimize the quantization process of analog/continous valued samples. Veton Këpuska

  14. Quantization of Speech & Audio Signals Scalar Quantization Vector Quantization

  15. a) b) x(t) AnalogLow-passFilter Sampleand Hold Analog to DigitalConverter DSP c) Time Quantization (Sampling) of Analog Signals Analog-to-Digital Conversion. • Continuous Signal x(t). • Sampled signal with sampling period T satisfying Nyquist rate as specified by Sampling Theorem. • Digital sequence obtained after sampling and quantization x[n] Veton Këpuska

  16. Example • Assume that the input continuous-time signal is pure periodic signal represented by the following expression: where A is amplitude of the signal, 0 is angular frequency in radians per second (rad/sec),  is phase in radians, and f0 is frequency in cycles per second measured in Hertz (Hz). Assuming that the continuous-time signal x(t) is sampled every T seconds or alternatively with the sampling rate of fs=1/T, the discrete-time signal x[n] representation obtained by t=nT will be: Veton Këpuska

  17. Example (cont.) • Alternative representation of x[n]: reveals additional properties of the discrete-time signal. • The F0= f0/fs defines normalized frequency, and 0digital frequency is defined in terms of normalized frequency: Veton Këpuska

  18. a) b) c) DSP Digital toAnalog Converter AnalogLow-passFilter y(t) y[n] ya(nT) Reconstruction of Digital Signals Digital-to-Analog Conversion. • Processed digital signal y[n]. • Continuous signal representation ya(nT). • Low-pass filtered continuous signal y(t). Veton Këpuska

  19. Scalar Quantization

  20. Conceptual Representation of ADC x(t) C/D Quantizer Coder • This conceptual abstraction allows us to assume that the sequence is obtained with infinite precession. Those values are scalar quantized to a set of finite precision amplitudes denoted here by . • Furthermore, quantization allows that this finite-precision set of amplitudes to be represented by corresponding set of (bit) patterns or symbols, . Veton Këpuska

  21. Without loss of generality, it can be assumed that • Input signals cover finite range of values defined by minimal, xmin and maximal values xmax respectively. This assumption in turn implies that • The set of symbols representing is finite. • Encoding: • The process of representing finite set of values to a finite set of symbols is know as encoding; performed by the coder, as in Figure in previous slide. • Mapping: • Thus one can view quantization and coding as a mapping of infinite precision value of to a finite precision representation picked from a finite set of symbols. Veton Këpuska

  22. Scalar Quantization • Quantization, therefore, is a mapping of a value x[n], xminx xmax, to. The quantizer operator, denoted by Q(x), is defined by: where denotes one of L possible quantization levels where 1 ≤ i ≤ Land xirepresents one of L +1 decision levels. • The above expression is interpreted as follows; If , then x[n] is quantized to the quantization level and is considered quantized sample of x[n]. Clearly from the limited range of input values and finite number of symbols it follows that quantization is characterized by its quantization step size i defined by the difference of two consecutive decision levels: Veton Këpuska

  23. Scalar Quantization • Assume that a sequence x[n] was obtained from speech waveform that has been lowpass-filtered and sampled at a suitable rate with infinite amplitude precision. • x[n] samples are quantized to a finite set of amplitudes denoted by . • Associated with the quantizer is a quantization step size. • Quantization allows the amplitudes to be represented by finite set of bit patterns – symbols. • Encoding: • Mapping of to a finite set of symbols. • This mapping yields a sequence of codewords denoted by c[n] (Figure 12.3a in the next slide). • Decoding – Inverse process whereby transmitted sequence of codewords c’[n] is transformed back to a sequence of quantized samples (Figure 12.3b in the next slide). Veton Këpuska

  24. Scalar Quantization Veton Këpuska

  25. Fundamentals • Assume a signal amplitude is quantized into M levels. • Quantizer operator is denoted by Q(x); Thus • Where denotes L possible reconstruction levels – quantization levels, and • 1≤i≤ L • xi denotes L +1 possible decision levels with 0≤i≤ L • If xi-1< x[n] < xi, then x[n] is quantized to the reconstruction level • is quantized sample of x[n]. Veton Këpuska

  26. Fundamentals • Scalar Quantization Example: • Assume there L=4 reconstruction levels. • Amplitude of the input signal x[n] falls in the range of [0,1] • Decision levels and Reconstruction levels are equally spaced: • Decision levels are [0,1/4,1/2,3/4,1] • Reconstruction levels assumed to be [0,1/8,3/8,5/8,7/8] • Figure 12.4 in the next slide. Veton Këpuska

  27. Example of Uniform 2-bit Quantizer Veton Këpuska

  28. Example • Assume there are L= 24 = 16 reconstruction levels. Assuming that input values fall within the range [xmin=-1, xmax=1] and that the each value in this range is equally likely. Decision levels and reconstruction levels are equally spaced; =i,= (xmax- xmin)/Li=0, …, L-1., • Decision Levels: • Reconstruction Levels: Veton Këpuska

  29. 16-4bit Level Quantization Example Veton Këpuska

  30. Uniform Quantizer • A uniform quantizer is one whose decision and reconstruction levels are uniformly spaced. Specifically: •  is the step size equal to the spacing between two consecutive decision levels which is the same spacing between two consecutive reconstruction levels. • Each reconstruction level is attached a symbol – the codeword. Binary numbers typically used to represent the quantized samples (as in Figure 12.4 in previous slide). Veton Këpuska

  31. Uniform Quantizer • Codebook: Collection of codewords. • In general with B-bit binary codebook there are 2B different quantization (or reconstruction) levels. • Bit rate is defined as the number of bits B per sample multiplied by sample rate fs: I=Bfs • Decoder inverts the coder operation taking the codeword back to a quantized amplitude value (e.g., 01 → ). • Often the goal of speech coding/decoding is to maintain the bit rate as low as possible while maintaining a required level of quality. • Because sampling rate is fixed for most applications this goal implies that the bit rate be reduced by decreasing the number of bits per sample Veton Këpuska

  32. Uniform Quantizer • Designing a uniform scalar quantizer requires knowledge of the maximum value of the sequence. • Typically the range of the speech signal is expressed in terms of the standard deviation of the signal. • Specifically, it is often assumed that: -4x≤x[n]≤4x where x is signal’s standard deviation. • Under the assumption that speech samples obey Laplacian pdf there are approximately 0.35% of speech samples fall outside of the range: -4x≤x[n]≤4x. • Assume B-bit binary codebook ⇒ 2B. • Maximum signal value xmax = 4x. Veton Këpuska

  33. Uniform Quantizer • For the uniform quantization step size  we get: • Quantization step size  relates directly to the notion of quantization noise. Veton Këpuska

  34. Quantization Noise • Two classes of quantization noise: • Granular Distortion • Overload Distortion • Granular Distortion • x[n] un-quantized signal and e[n] is the quantization noise. • For given step size  the magnitude of the quantization noise e[n] can be no greater than /2, that is: • Figure 12.5 depicts this property were: Veton Këpuska

  35. Quantization Noise Veton Këpuska

  36. Example • For the periodic sine-wave signal use 3-bit and 8-bit quantizer values. The input periodic signal is given with the following expression: • MATLAB fix function is used to simulate quantization. The following figure depicts the result of the analysis. Veton Këpuska

  37. L=23=8 & 28= 256 Levels Quantization • Plot a) represents sequence x[n] with infinite precision, b) represents quantized version L=8, c) represents quantization error e[n] for B=3 bits (L=8 quantization levels), and d) is quantization error for B=8 bits (L=256 quantization levels). Veton Këpuska

  38. Quantization Noise • Overload Distortion • Maximum-value constant: • xmax = 4x (4x≤x[n]≤4x) • For Laplacian pdf, 0.35% of the speech samples fall outside the range of the quantizer. • Clipped samples incur a quantization error in excess of /2. • Due to the small number of clipped samples it is common to neglect the infrequent large errors in theoretical calculations. Veton Këpuska

  39. Quantization Noise • Statistical Model of Quantization Noise • Desired approach in analyzing the quantization error in numerous applications. • Quantization error is considered an ergodic white-noise random process. • The autocorrelation function of such a process is expressed as: Veton Këpuska

  40. Quantization Error • Previous expression states that the process is uncorrelated. • Furthermore, it is also assumed that the quantization noise and the input signal are uncorrelated, i.e., • E(x[n]e[n+m])=0,  m. • Final assumption is that the pdf of the quantization noise is uniform over the quantization interval: Veton Këpuska

  41. Quantization Error • Stated assumptions are not always valid. • Consider a slowly varying – linearly varying signal ⇒ then e[n] is also changing linearly and is signal dependent (see Figure in the next slide). • Correlated quantization noise can be annoying. • When quantization step  is small then assumptions for the noise being uncorrelated with itself and the signal are roughly valid when the signal fluctuates rapidly among all quantization levels. • Quantization error approaches a white-noise process with an impulsive autocorrelation and flat spectrum. • One can force e[n] to be white-noise and uncorrelated with x[n] by adding white-noise to x[n] prior to quantization. Veton Këpuska

  42. Example of Quantization Error due to Correlation • Example of slowly varying signal that causes quantization error to be correlated. Plot • represents sequence x[n] with infinite precision, • represents quantized version , • represents quantization error e[n] for B=3 bits (L=9 quantization levels), and • is quantization error for B=8 bits (L=256 quantization levels). Note reduction in correlation level with increase of number of quantization levels which implies degrease of step size . Veton Këpuska

  43. Quantization Error • Process of adding white noise is known as Dithering. • This decorrelation technique was shown to be useful not only in improving the perceptual quality of the quantization noise but also with image signals. • Signal-to-Noise Ratio • A measure to quantify severity of the quantization noise. • Relates the strength of the signal to the strength of the quantization noise. Veton Këpuska

  44. Quantization Error • SNR is defined as: • Given assumptions for • Quantizer range: 2xmax, and • Quantization interval: = 2xmax/2B, for a B-bit quantizer • Uniform pdf, it can be shown that: Veton Këpuska

  45. Quantization Error • Thus SNR can be expressed as: • Or in decibels (dB) as: • Because xmax = 4x, then SNR(dB)≈6B-7.2 Veton Këpuska

  46. Quantization Error • Presented quantization scheme is called pulse code modulation (PCM). • B-bits per sample are transmitted as a codeword. • Advantages of this scheme: • It is instantaneous (no coding delay) • Independent of the signal content (voice, music, etc.) • Disadvantages: • It requires minimum of 11 bits per sample to achieve “toll quality” (equivalent to a typical telephone quality) • For 10,000 Hz sampling rate, the required bit rate is:B=(11 bits/sample)x(10000 samples/sec)=110,000 bps=110 kbps • For CD quality signal with sample rate of 20,000 Hz and 16-bits/sample, SNR(dB) =96-7.2=88.8 dB and bit rate of 320 kbps. Veton Këpuska

  47. Nonuniform Quantization • Uniform quantization may not be optimal (SNR can not be as small as possible for certain number of decision and reconstruction levels) • Consider for example speech signal for which x[n] is much more likely to be in one particular region than in other (low values occurring much more often than the high values). • This implies that decision and reconstruction levels are not being utilized effectively with uniform intervals over xmax. • A Nonuniform quantization that is optimal (in a least-squared error sense) for a particular pdf is referred to as the Max quantizer. • Example of a nonuniform quantizer is given in the figure in the next slide. Veton Këpuska

  48. Nonuniform Quantization Veton Këpuska

  49. Nonuniform Quantization • Max Quantizer • Problem Definition: For a random variable x with a known pdf, find the set of M quantizer levels that minimizes the quantization error. • Therefore, finding the decision and boundary levels xi and xi, respectively, that minimizes the mean-squared error (MSE) distortion measure: D=E[(x-x)2] • E-denotes expected value and x is the quantized version of x. • It turns out that optimal decision level xk is given by: ^ ^ ^ Veton Këpuska

  50. Nonuniform Quantization • Max Quantizer (cont.) • The optimal reconstruction level xk is the centroid of px(x) over the interval xk-1≤ x ≤xk: • It is interpreted as the mean value of x over interval xk-1≤ x ≤xk for the normalized pdf p(x). • Solving last two equations for xk and xk is a nonlinear problem in these two variables. • Iterative solution which requires obtaining pdf (can be difficult). ^ ~ ^ Veton Këpuska

More Related