Speech Processing

Speech Processing Short-Time Fourier Transform Analysis and Synthesis

Short-Time Fourier Transform Analysis and Synthesis Minimum-Phase Synthesis • Speech & Audio Signals are varying and can be considered stochastic signals that carry information. • This necessitates short-time analysis since a single Fourier transform (FT) can not characterize changes in spectral content over time (i.e., time-varying formants and harmonics) • Discrete-time short-time Fourier transform (STFT) consists of separate FT of the signal in the neighborhood of that instant. • FT in the STFT analysis is replaced by the discrete FT (DFT) • Resulting STFT is discrete in both time and frequency. • Discrete STFT vs. • Discrete-time STFT which is continuous in frequency. • In linear Prediction and Homomorphic Processing, underlying model of the source/filter is assumed. This leads to: • Model based analysis/synthesis, also note that • Analysis methods presented implicitly both used short time analysis methods (to be presented). • In Short-Time Analysis systems no such restrictions apply. Veton Këpuska

Short-Time Analysis (STFT) • Two approaches of STFT are explored: • Fourier-transform & • Filterbank Veton Këpuska

Fourier-Transform View • Recall (from Chapter 3): • w[n] is a finite-length, symmetrical sequence (i.e., window) of length Nw. • w[n] ≠ 0 for [0, Nw-1] • w[n] – Analysis window or Analysis Filter Veton Këpuska

Fourier-Transform View • x[n] – time-domain signal • fn[m]=x[m]w[n-m] - Denotes short-time section of x[m] at point n. That is, signal at the frame n. • X(n,) - Fourier transform of fn[m] of short-time windowed signal data. • Computing the DFT: Veton Këpuska

Fourier-Transform View • Thus X(n,k) is STFT for every =(2/N)k • Frequency sampling interval = (2/N) • Frequency sampling factor = N • DFT: Veton Këpuska

Fourier-Transform View Veton Këpuska

Example 7.1 • Let x[n] be a periodic impulse train sequence: • Also let w[n] be a triangle of length P: … -P P 2P n 3P P/2+1 -P/2 n 0 P-points Veton Këpuska

Example 7.1 Non-zero only for m=lP Window located at lP & Linear phase -lP Veton Këpuska

Example 7.1 • Since windows w[n] do not overlap, |X(n,)| = constant and ∠X(n,) is linear. • Computation of DFT for N=P gives: 1 DFT of translated, non-overlapping windows with phase shift of zero (due to sampling) Veton Këpuska

Spectogram |X(n,)|2 • If analysis window length is ≤ pitch period ⇒ wideband spectrogram⇒ vertical striations • Otherwise⇒ narrowband spectrogram⇒ horizontal striations • How often to apply analysis window to the signal? • X(n,k) is decimated by a temporal decimation factor L: • X(nL,k) = DFT{fnL(m)} • fnL[m] sections are a subset of fn[m] • How to chose sampling rates in time (L) and frequency (N-fft length) it will be addressed in one of the forthcoming sections. Veton Këpuska

Analysis window x[m] L w[pL-m] p=3 p=1 p=2 Veton Këpuska

Spectrogram |X(n,)|2 Veton Këpuska

Fourier-Transform View • Note that in , X(n,) is periodic over 2 (same as Fourier transform) and is Hermetian (H=H’) symmetric. • For real sequences  • Re{X(n,)} or |X(n,)| is symmetric • Im{X(n,)} orarg{X(n,)} is anti-symmetric • A time-shift results in linear phase shift (same as in Fourier Transform): • Thus, a shift by n0 in the original time sequence introduces a linear phase, but also a shift in time, corresponding to a shift in each short-time section by n0. Veton Këpuska

Filtering View • In the interpretation w[n] is considered to be a filter whose impulse response is w[n]. • Thus w[n] is referred to as analysis filter. • Let’s fix the value of =o. • The above equation represents the convolution of the sequence x[n]e-jon with the sequence w[n]. Thus: Veton Këpuska

Filtering View • The product: x[n]e-jon  Modulation of x[n] up to frequency o. Veton Këpuska

Alternate view: The discrete STFT can be also interpreted from the filtering viewpoint. This equation brings the interpretation of the discrete STFT as the output of the filter bank shown in the next slide. Filtering View Veton Këpuska

Filtering View Veton Këpuska

Filtering View • General Properties: • If x[n] has the length N & w[n] has the length M, then X(n,) has length N+M+1 along n. • The bandwidth of X(n,o) is less than or equal to that of w[n]. • Sequence X(n,o) has its spectrum centered at the origin. Veton Këpuska

Example 7.2 • Consider a Gaussian window of the form: • The discrete STFT with DFT length N, therefore, can be considered as a bank of filters with impulse responses: • For x[n]=(n)  x[n]*hk[n]=hk[n] • If N=50, corresponding to bandpass filters spaced by 200 Hz for the sampling rate of 10000 samples/s, then: Veton Këpuska

Example 7.2 • For k=0,5,10,15 the following is obtained: Veton Këpuska

Example 7.2 Veton Këpuska

Example 7.3 • Consider the filter bank of previous example 7.2 that was designed with a Gaussian window of the form: • Figure 7.7 shows the Fourier transform magnitudes of the output of the four complex bandpass filters hk[n] for k=0,5,10, and 15 as presented in previous slide and depicted in the figure 7.6. Veton Këpuska

Example 7.3 • After Demodulation the resulting bandpass outputs have the same spectral shape as in the figure but centered at the origin. Veton Këpuska

Time-Frequency Resolution Tradeoffs • In Chapter 3 basic issue in analysis window selection is the compromise required between a long window for showing signal detail in frequency and a short window required for representing fine temporal structure: • Since both X() and W() are periodic over 2 linear convolution is essentially circular. • From the equation above: • W() smears (smoothes) X(). • Want W() as narrow as possible ideally W()=() for good frequency resolution. • W()=() will result in a infinitely long w[n]. • Poor time resolution. • Conflicting goal Veton Këpuska

Example 7.4 • Figure 7.8 depicts time-frequency resolution tradeoff: Veton Këpuska

Time-Frequency Resolution Tradeoffs • From the previous example, smoothing interpretation of STFT is not valid for non-stationary sequences. • For steady signal long analysis windows are appropriate and they yield good frequency resolution as depicted in the next figure. Veton Këpuska

Time-Frequency Resolution Tradeoffs • However, for short and transient signals, plosive speech, flaps, diphthongs, etc. , short windows are preferred in order to capture temporal events. • Shorter windows yield poor frequency resolution. Veton Këpuska

Short-Time Synthesis • How to obtain original sequence back from its discrete-time STFT? • The inversion is represented mathematically by a synthesis equation which expresses a sequence in terms of its discrete-time STFT. • Recall that for fn[m]=x[m]w[n-m]: • Thus: If w[n]≠0 then recovery is complete. Veton Këpuska

Short-Time Synthesis • For each n, we take the inverse Fourier transform of the corresponding function of frequency, then we obtain the sequence fn[m]. • Evaluating fn[m] for m=n the following is obtained: • x[n]w[0]. • For w[0]≠0 x[n] can be obtained by dividing fn[n]/w[0]. • The process of taking the inverse Fourier transform of X(n,) for a specific n and then dividing by w[0] is represented in the following relation:representing synthesis equation for the discrete-time STFT. Veton Këpuska

Short-Time Synthesis • In contrast to discrete-time STFT X(n,) the discrete STFT X(n,k) is not always invertible. • Example 1. • Consider the case when w[n] is bandlimited with bandwidth of B. Veton Këpuska

Short-Time Synthesis • Note if there are frequency components of x[n] which do not pass through any of the filter regions of the discrete STFT then • it is not a unique representation of x[n], and • x[n] is not invertible. • Example 2. • Consider X(n,k) decimated in time by factor L, i.e., STFT is applied every L samples. • w[n] is non-zero over its length Nw. • If L > Nw then there are gaps in time where x[n] is not represented/considered. • Thus in such cases again x[n] is not invertible. Veton Këpuska

L > Nw L x[m] w[pL-m] Nw Veton Këpuska

Short-Time Synthesis • Conclusion: • Constraints must be adopted to ensure uniqueness and invertability: • Proper/Adequate frequency sampling: B≥2/Nw (B - Window bandwidth) • Proper Temporal Decimation: L≤Nw Veton Këpuska

Filter Bank Summation (FBS) Method • Traditional short-time synthesis method that is commonly referred to as the Filter Bank Summation (FBS). • FBS is best described in terms of the filtering interpretation of the discrete STFT. • The discrete STFT is considered to be the set of outputs of a bank of filters. • The output of each filter is modulated with a complex exponential • Modulated filter outputs are summed at each instant of time to obtain the corresponding time sample of the original sequence (see Figure 7.5(b) in the slide 18). Veton Këpuska

Filter Bank Summation (FBS) Method • Recall the synthesis equation given earlier: • FBS method carries out discrete version of this equation by utilizing discrete STFT X(n,k): • Derive conditions such that to ensure that y[n] x[n]. Veton Këpuska

1 Analysis followed by synthesis y[n] x[n] Filter Bank Summation (FBS) Method • From Figure 7.5 • Thus: Interchanging summation operation this equation reduces to: Veton Këpuska

Filter Bank Summation (FBS) Method • Furthermore Veton Këpuska

Filter Bank Summation (FBS) Method • Thus:y[n] is the output of the convolution of x[n] with a product of the analysis window with a periodic impulse sequence. • Note:reduces to [n] if: • Window length Nw≤N, or • For Nw>N, must have w[rN]=0 for r≠0, that is Veton Këpuska

Filter Bank Summation (FBS) Method Veton Këpuska

Filter Bank Summation (FBS) Method • This constraint is known as the FBS constraint. • It must be fulfilled in order to ensure exact signal synthesis with the FBS method. • This constrained is commonly expressed in frequency domain: • This expression states that the frequency responses of the analysis filters should sum to a constant across the entire bandwidth. • We will conclude this discussion by stating that a filter bank with N filters, based on an analysis filter of length less than or equal to N, is always an all-pass system. Veton Këpuska

Generalized FBS Method • Note: • “Smoothing” function f[n.m] is referred to as the time-varying synthesis filter. • It can be shown that any f[n,m] that fulfills the condition below makes the synthesis equation above valid (Exercise 7.6): • Note also that basic FBS method can be obtained by setting the synthesis filter to be a non-smoothing filter: f[n,m]=[m] Veton Këpuska

Generalized FBS Method • Consider the discrete STFT with decimation factor L. Generalized FSB of the synthesized signal is given by: • Furthermore, consider time invariant smoothing filter: f[n,m]=f[m] • That is: f[n,n-rL]=f[n-rL] Veton Këpuska

Generalized FBS Method • Thus • This equation holds when the following constrain is satisfied by the analysis and synthesis filters as well as the temporal decimation and frequency sampling factors: • For f[m]=[m] and L=1 this method reduces to the basic FBS method. Veton Këpuska

Generalized FBS Method • Interested in L>1 case and in using f[n] as interpolator.  Interpolation FBS Methods: • Helical Interpolation (Partnoff) • Weighted Overlap-add Method (Croshiere) Veton Këpuska

Overlap-Add (OLA) Method • FBS Method was motivated from the filtering view of the STFT • OLA method was motivated from the Fourier transform view of the STFT. • In the OLA method: • Inverse DFT for each fixed time in the discrete STFT is taken, • Overlap and add operation between the short-time section is performed, • This works provided that analysis window is designed such that the overlap and add operation effectively eliminates the analysis window from the synthesized sequence. • Basic idea is that the redundancy within overlapping segments and the averaging of the redundant samples remove the effect of windowing. Veton Këpuska

Overlap-Add (OLA) Method • Recall the short-time synthesis relation: • If x[n] is averaged over many short-time segments and normalized by W(0) thenwhere Veton Këpuska

Overlap-Add (OLA) Method • Discretized version of OLA is given by: • Note that the above IDFT is true provided that N>Nw. The expression for y[n] thus becomes: • Which provided that:then y[n]=x[n] Always True because sum of values of a sequence must always equal the first value of its Fourier transform (D.C. Energy of a signal is by definition sum of signal values) Veton Këpuska

Overlap-Add (OLA) Method • For decimation in time by factor of L, it can be shown (Exercise 7.4) that: • Then x[n] can be synthesized using the following equation: • The above equation depicts general constrain imposed by OLA method. It requires that the sum of all the analysis windows (obtained by sliding w[n] with L-point increments) to add up to a constant as shown in the next figure. Veton Këpuska

Overlap-Add (OLA) Method Veton Këpuska

Speech Processing