Speech Processing

Speech Processing Speech Coding

Speech Coding • Definition: • Speech Coding is a process that leads to the representation of analog waveforms with sequences of binary digits. • Even though availability of high-bandwidth communication channels has increased, speech coding for bit reduction has retained its importance. • Reduced bit-rates transmissions required for cellular networks • Voice over IP • Coded speech • Is less sensitive than analog signals to transmission noise • Easier to: • protect against (bit) errors • Encrypt • Multiplex, and • Packetize • Typical Scenario depicted in next slide (Figure 12.1) Veton Këpuska

Digital Telephone Communication System Veton Këpuska

Categorization of Speech Coders • Waveform Coders: • Used to quantize speech samples directly and operate at high-bit rates in the range of 16-64 kbps (bps - bits per second) • Hybrid Coders • Are partially waveform coders and partly speech model-based coders and operate in the mid bit rate range of 2.4-16 kbps. • Vocoders • Largely model-based and operate at a low bit rate range of 1.2-4.8 kbps. • Tend to be of lower quality than waveform and hybrid coders. Veton Këpuska

Quality Measurements • Quality of coding can is viewed as the closeness of the processed speech to the original speech or some other desired speech waveform. • Naturalness • Degree of background artifacts • Intelligibility • Speaker identifiability • Etc. Veton Këpuska

Quality Measurements • Subjective Measurement: • Diagnostic Rhyme Test (DRT) measures intelligibility. • Diagnostic Acceptability Measure and Mean Opinion Score (MOS) test provide a more complete quality judgment. • Objective Measurement: • Segmental Signal to Noise Ratio (SNR) – average SNR over a short-time segments • Articulation Index – relies on an average SNR across frequency bands. Veton Këpuska

Quality Measurements • A more complete list and definition of subjective and objective measures can be found at: • J.R. Deller, J.G. Proakis, and J.H.I Hansen, “Discrete-Time Processing of Speech”, Macmillan Publishing Co., New York, NY, 1993 • S.R. Quackenbush, T.P. Barnwell, and M.A. Clements, “Objective Measures of Speech Quality. Prentice Hall, Englewood Cliffs, NJ. 1988 Veton Këpuska

Statistical Models

Statistical Models • Speech waveform is viewed as a random process. • Various estimates are important from this statistical perspective: • Probability Density • Mean, Variance and Autocorrelation • One approach to estimate a probability density function (pdf) of x[n] is through histogram. • Count up the number of occurrences of the value of each speech sample in different ranges:for many speech samples over a long time duration. • Normalize the area of the resulting curve to unity. Veton Këpuska

Statistical Models • The histogram of speech (Davenport, Paez & Glisson) was shown to approximate a gamma density:where x is the standard deviation of the pdf. • Simpler approximation is given by the Laplacian pdf of the form: Veton Këpuska

PDF of Speech Veton Këpuska

PDF of Modeled Speech Veton Këpuska

PDF of Speech • Knowing pdf of speech is essential in order to optimize the quantization process of analog/continous valued samples. Veton Këpuska

Quantization of Speech & Audio Signals Scalar Quantization Vector Quantization

a) b) x(t) AnalogLow-passFilter Sampleand Hold Analog to DigitalConverter DSP c) Time Quantization (Sampling) of Analog Signals Analog-to-Digital Conversion. • Continuous Signal x(t). • Sampled signal with sampling period T satisfying Nyquist rate as specified by Sampling Theorem. • Digital sequence obtained after sampling and quantization x[n] Veton Këpuska

Example • Assume that the input continuous-time signal is pure periodic signal represented by the following expression: where A is amplitude of the signal, 0 is angular frequency in radians per second (rad/sec),  is phase in radians, and f0 is frequency in cycles per second measured in Hertz (Hz). Assuming that the continuous-time signal x(t) is sampled every T seconds or alternatively with the sampling rate of fs=1/T, the discrete-time signal x[n] representation obtained by t=nT will be: Veton Këpuska

Example (cont.) • Alternative representation of x[n]: reveals additional properties of the discrete-time signal. • The F0= f0/fs defines normalized frequency, and 0digital frequency is defined in terms of normalized frequency: Veton Këpuska

a) b) c) DSP Digital toAnalog Converter AnalogLow-passFilter y(t) y[n] ya(nT) Reconstruction of Digital Signals Digital-to-Analog Conversion. • Processed digital signal y[n]. • Continuous signal representation ya(nT). • Low-pass filtered continuous signal y(t). Veton Këpuska

Scalar Quantization

Conceptual Representation of ADC x(t) C/D Quantizer Coder • This conceptual abstraction allows us to assume that the sequence is obtained with infinite precession. Those values are scalar quantized to a set of finite precision amplitudes denoted here by . • Furthermore, quantization allows that this finite-precision set of amplitudes to be represented by corresponding set of (bit) patterns or symbols, . Veton Këpuska

Without loss of generality, it can be assumed that • Input signals cover finite range of values defined by minimal, xmin and maximal values xmax respectively. This assumption in turn implies that • The set of symbols representing is finite. • Encoding: • The process of representing finite set of values to a finite set of symbols is know as encoding; performed by the coder, as in Figure in previous slide. • Mapping: • Thus one can view quantization and coding as a mapping of infinite precision value of to a finite precision representation picked from a finite set of symbols. Veton Këpuska

Scalar Quantization • Quantization, therefore, is a mapping of a value x[n], xminx xmax, to. The quantizer operator, denoted by Q(x), is defined by: where denotes one of L possible quantization levels where 1 ≤ i ≤ Land xirepresents one of L +1 decision levels. • The above expression is interpreted as follows; If , then x[n] is quantized to the quantization level and is considered quantized sample of x[n]. Clearly from the limited range of input values and finite number of symbols it follows that quantization is characterized by its quantization step size i defined by the difference of two consecutive decision levels: Veton Këpuska

Scalar Quantization • Assume that a sequence x[n] was obtained from speech waveform that has been lowpass-filtered and sampled at a suitable rate with infinite amplitude precision. • x[n] samples are quantized to a finite set of amplitudes denoted by . • Associated with the quantizer is a quantization step size. • Quantization allows the amplitudes to be represented by finite set of bit patterns – symbols. • Encoding: • Mapping of to a finite set of symbols. • This mapping yields a sequence of codewords denoted by c[n] (Figure 12.3a in the next slide). • Decoding – Inverse process whereby transmitted sequence of codewords c’[n] is transformed back to a sequence of quantized samples (Figure 12.3b in the next slide). Veton Këpuska

Scalar Quantization Veton Këpuska

Fundamentals • Assume a signal amplitude is quantized into M levels. • Quantizer operator is denoted by Q(x); Thus • Where denotes L possible reconstruction levels – quantization levels, and • 1≤i≤ L • xi denotes L +1 possible decision levels with 0≤i≤ L • If xi-1< x[n] < xi, then x[n] is quantized to the reconstruction level • is quantized sample of x[n]. Veton Këpuska

Fundamentals • Scalar Quantization Example: • Assume there L=4 reconstruction levels. • Amplitude of the input signal x[n] falls in the range of [0,1] • Decision levels and Reconstruction levels are equally spaced: • Decision levels are [0,1/4,1/2,3/4,1] • Reconstruction levels assumed to be [0,1/8,3/8,5/8,7/8] • Figure 12.4 in the next slide. Veton Këpuska

Example of Uniform 2-bit Quantizer Veton Këpuska

Example • Assume there are L= 24 = 16 reconstruction levels. Assuming that input values fall within the range [xmin=-1, xmax=1] and that the each value in this range is equally likely. Decision levels and reconstruction levels are equally spaced; =i,= (xmax- xmin)/Li=0, …, L-1., • Decision Levels: • Reconstruction Levels: Veton Këpuska

16-4bit Level Quantization Example Veton Këpuska

Uniform Quantizer • A uniform quantizer is one whose decision and reconstruction levels are uniformly spaced. Specifically: •  is the step size equal to the spacing between two consecutive decision levels which is the same spacing between two consecutive reconstruction levels. • Each reconstruction level is attached a symbol – the codeword. Binary numbers typically used to represent the quantized samples (as in Figure 12.4 in previous slide). Veton Këpuska

Uniform Quantizer • Codebook: Collection of codewords. • In general with B-bit binary codebook there are 2B different quantization (or reconstruction) levels. • Bit rate is defined as the number of bits B per sample multiplied by sample rate fs: I=Bfs • Decoder inverts the coder operation taking the codeword back to a quantized amplitude value (e.g., 01 → ). • Often the goal of speech coding/decoding is to maintain the bit rate as low as possible while maintaining a required level of quality. • Because sampling rate is fixed for most applications this goal implies that the bit rate be reduced by decreasing the number of bits per sample Veton Këpuska

Uniform Quantizer • Designing a uniform scalar quantizer requires knowledge of the maximum value of the sequence. • Typically the range of the speech signal is expressed in terms of the standard deviation of the signal. • Specifically, it is often assumed that: -4x≤x[n]≤4x where x is signal’s standard deviation. • Under the assumption that speech samples obey Laplacian pdf there are approximately 0.35% of speech samples fall outside of the range: -4x≤x[n]≤4x. • Assume B-bit binary codebook ⇒ 2B. • Maximum signal value xmax = 4x. Veton Këpuska

Uniform Quantizer • For the uniform quantization step size  we get: • Quantization step size  relates directly to the notion of quantization noise. Veton Këpuska

Quantization Noise • Two classes of quantization noise: • Granular Distortion • Overload Distortion • Granular Distortion • x[n] un-quantized signal and e[n] is the quantization noise. • For given step size  the magnitude of the quantization noise e[n] can be no greater than /2, that is: • Figure 12.5 depicts this property were: Veton Këpuska

Quantization Noise Veton Këpuska

Example • For the periodic sine-wave signal use 3-bit and 8-bit quantizer values. The input periodic signal is given with the following expression: • MATLAB fix function is used to simulate quantization. The following figure depicts the result of the analysis. Veton Këpuska

L=23=8 & 28= 256 Levels Quantization • Plot a) represents sequence x[n] with infinite precision, b) represents quantized version L=8, c) represents quantization error e[n] for B=3 bits (L=8 quantization levels), and d) is quantization error for B=8 bits (L=256 quantization levels). Veton Këpuska

Quantization Noise • Overload Distortion • Maximum-value constant: • xmax = 4x (4x≤x[n]≤4x) • For Laplacian pdf, 0.35% of the speech samples fall outside the range of the quantizer. • Clipped samples incur a quantization error in excess of /2. • Due to the small number of clipped samples it is common to neglect the infrequent large errors in theoretical calculations. Veton Këpuska

Quantization Noise • Statistical Model of Quantization Noise • Desired approach in analyzing the quantization error in numerous applications. • Quantization error is considered an ergodic white-noise random process. • The autocorrelation function of such a process is expressed as: Veton Këpuska

Quantization Error • Previous expression states that the process is uncorrelated. • Furthermore, it is also assumed that the quantization noise and the input signal are uncorrelated, i.e., • E(x[n]e[n+m])=0,  m. • Final assumption is that the pdf of the quantization noise is uniform over the quantization interval: Veton Këpuska

Quantization Error • Stated assumptions are not always valid. • Consider a slowly varying – linearly varying signal ⇒ then e[n] is also changing linearly and is signal dependent (see Figure in the next slide). • Correlated quantization noise can be annoying. • When quantization step  is small then assumptions for the noise being uncorrelated with itself and the signal are roughly valid when the signal fluctuates rapidly among all quantization levels. • Quantization error approaches a white-noise process with an impulsive autocorrelation and flat spectrum. • One can force e[n] to be white-noise and uncorrelated with x[n] by adding white-noise to x[n] prior to quantization. Veton Këpuska

Example of Quantization Error due to Correlation • Example of slowly varying signal that causes quantization error to be correlated. Plot • represents sequence x[n] with infinite precision, • represents quantized version , • represents quantization error e[n] for B=3 bits (L=9 quantization levels), and • is quantization error for B=8 bits (L=256 quantization levels). Note reduction in correlation level with increase of number of quantization levels which implies degrease of step size . Veton Këpuska

Quantization Error • Process of adding white noise is known as Dithering. • This decorrelation technique was shown to be useful not only in improving the perceptual quality of the quantization noise but also with image signals. • Signal-to-Noise Ratio • A measure to quantify severity of the quantization noise. • Relates the strength of the signal to the strength of the quantization noise. Veton Këpuska

Quantization Error • SNR is defined as: • Given assumptions for • Quantizer range: 2xmax, and • Quantization interval: = 2xmax/2B, for a B-bit quantizer • Uniform pdf, it can be shown that: Veton Këpuska

Quantization Error • Thus SNR can be expressed as: • Or in decibels (dB) as: • Because xmax = 4x, then SNR(dB)≈6B-7.2 Veton Këpuska

Quantization Error • Presented quantization scheme is called pulse code modulation (PCM). • B-bits per sample are transmitted as a codeword. • Advantages of this scheme: • It is instantaneous (no coding delay) • Independent of the signal content (voice, music, etc.) • Disadvantages: • It requires minimum of 11 bits per sample to achieve “toll quality” (equivalent to a typical telephone quality) • For 10,000 Hz sampling rate, the required bit rate is:B=(11 bits/sample)x(10000 samples/sec)=110,000 bps=110 kbps • For CD quality signal with sample rate of 20,000 Hz and 16-bits/sample, SNR(dB) =96-7.2=88.8 dB and bit rate of 320 kbps. Veton Këpuska

Nonuniform Quantization • Uniform quantization may not be optimal (SNR can not be as small as possible for certain number of decision and reconstruction levels) • Consider for example speech signal for which x[n] is much more likely to be in one particular region than in other (low values occurring much more often than the high values). • This implies that decision and reconstruction levels are not being utilized effectively with uniform intervals over xmax. • A Nonuniform quantization that is optimal (in a least-squared error sense) for a particular pdf is referred to as the Max quantizer. • Example of a nonuniform quantizer is given in the figure in the next slide. Veton Këpuska

Nonuniform Quantization Veton Këpuska

Nonuniform Quantization • Max Quantizer • Problem Definition: For a random variable x with a known pdf, find the set of M quantizer levels that minimizes the quantization error. • Therefore, finding the decision and boundary levels xi and xi, respectively, that minimizes the mean-squared error (MSE) distortion measure: D=E[(x-x)2] • E-denotes expected value and x is the quantized version of x. • It turns out that optimal decision level xk is given by: ^ ^ ^ Veton Këpuska

Nonuniform Quantization • Max Quantizer (cont.) • The optimal reconstruction level xk is the centroid of px(x) over the interval xk-1≤ x ≤xk: • It is interpreted as the mean value of x over interval xk-1≤ x ≤xk for the normalized pdf p(x). • Solving last two equations for xk and xk is a nonlinear problem in these two variables. • Iterative solution which requires obtaining pdf (can be difficult). ^ ~ ^ Veton Këpuska

Speech Processing