1 / 35

Speech Signal Representations I

Speech Signal Representations I. Seminar Speech Recognition 2002 F.R. Verhage. Speech Signal Representations I. Decomposition of the speech signal (x[n]) as a source (e[n]) passed through a linear time-varying filter (h[n]). Speech Signal Representations I.

ocean-young
Download Presentation

Speech Signal Representations I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage

  2. Speech Signal Representations I Decomposition of the speech signal (x[n]) as a source (e[n]) passed through a linear time-varying filter (h[n]).

  3. Speech Signal Representations I Estimation of the filter, inspired by: • Speech production models • Linear Predictive Coding (LPC) • Cepstral analysis • Speech perception models (part II) • Mel-frequency cepstrum • Perceptual Linaer Prediction (PLP) Speech recognizers estimate filter characteristics and ignore the source

  4. Speech Signal Representations IShort-Time Fourier Analysis • Spectrogram • Representation of a signal highlighting several of its properties based on short-time Fourier analysis • Two dimensional: time horizontal and frequency vertical • Third ‘dimension’: gray or color level indicating energy

  5. Speech Signal Representations IShort-Time Fourier Analysis • Spectrogram • Narrow band • Long windows (> 20 ms) → • Narrow bandwidth • Lower time resolution, better frequency resolution • Wide band • Short windows ( <10 ms) → • Wide bandwidth • Good time resolution, lower frequency resolution • Pitch synchronous • Requires knowledge of local pitch period

  6. Speech Signal Representations IShort-Time Fourier Analysis • Spectrogram

  7. Speech Signal Representations IShort-Time Fourier Analysis • Window analysis • Series of short segments, analysis frames • Short enough so that the signal is stationary • Usually constant, 20-30 ms • Overlaps possible • Different types of window functions (wm[n]): • Rectangular (equal to no window function) • Hamming • Hanning

  8. Speech Signal Representations IShort-Time Fourier Analysis • Window analysis • Window size must be long enough • Rectangular: N ≥ M • Hamming, Hanning: N ≥ 2M • Pitch period not known in advance → • Prepare for lowest pitch period → • At least 20ms for rectangular or 40ms for Hamming/Hanning (50Hz) • But longer windows give a more average spectrum instead of distinct spectra → • Rectangular window has better time resolution

  9. Speech Signal Representations IShort-Time Fourier Analysis

  10. Speech Signal Representations IShort-Time Fourier Analysis

  11. Speech Signal Representations IShort-Time Fourier Analysis

  12. Speech Signal Representations IShort-Time Fourier Analysis

  13. Speech Signal Representations IShort-Time Fourier Analysis

  14. Speech Signal Representations IShort-Time Fourier Analysis

  15. Speech Signal Representations IShort-Time Fourier Analysis

  16. Speech Signal Representations IShort-Time Fourier Analysis • Window analysis • Frequency response not completely zero outside main lobe → Spectral leakage • Second lobe of a Hamming window is approx. 43dB below main lobe → less spectral leakage • Hamming, Hanning, triangular windows offer less spectral leakage → • Rectangular windows are rarely used despite their better time resolution

  17. Speech Signal Representations IShort-Time Fourier Analysis

  18. Speech Signal Representations IShort-Time Fourier Analysis

  19. Speech Signal Representations IShort-Time Fourier Analysis

  20. Speech Signal Representations IShort-Time Fourier Analysis

  21. Speech Signal Representations IShort-Time Fourier Analysis Short-time spectrum of male voice speech • Time signal /ah/local pitch 110Hz • 30ms rectangularwindow • 15ms rectangular window • 30ms Hammingwindow • 15ms Hammingwindow

  22. Speech Signal Representations IShort-Time Fourier Analysis Short-time spectrum of female voice speech • Time signal /aa/local pitch 200Hz • 30ms rectangularwindow • 15ms rectangular window • 30ms Hammingwindow • 15ms Hammingwindow

  23. Speech Signal Representations IShort-Time Fourier Analysis Short-time spectrum of unvoiced speech • Time signal • 30ms rectangularwindow • 15ms rectangular window • 30ms Hammingwindow • 15ms Hammingwindow

  24. Speech Signal Representations ILinear Predictive Coding • LPC a.k.a. auto-regressive (AR) modeling • All-pole filter is good approximation of speech, with p as the order of the LPC analysis: • Predicts current sample as linear combination of past p samples

  25. Speech Signal Representations ILinear Predictive Coding • To estimate predictor coefficients (ak), use short-term analysis technique • Per segment, minimize the total prediction error by calculating the minimum squared error • Take the derivative, equate it to 0; expressed as a set of p linear equations:the Yule-Walker equations

  26. Speech Signal Representations ILinear Predictive Coding • Solution of the Yule-Walker equations: • Any standard matrix inversion package • Due to the special form of the matrix, efficient solutions: • Covariance methodusing the Cholesky decomposition • Autocorrelation methodusing windows, results in equations with Toeplitz matrices, solved by the Durbin recursion algorithm • Lattice methodequivalent to Levinson Durbin recursionoften used in fixed-point implementations because lack of precision doesn’t result in unstable filters

  27. Speech Signal Representations ILinear Predictive Coding

  28. Speech Signal Representations ILinear Predictive Coding

  29. Speech Signal Representations ILinear Predictive Coding • Spectral analysis via LPC • All-pole (IIR) filter • Peaks at the roots of the denominator

  30. Speech Signal Representations ILinear Predictive Coding • Prediction error • Should be (approximately) the excitation • Unvoiced speech, expect white noise; OK • Voiced speech, expect impulse train; NOK • All-pole assumption not altogether valid • Real speech not perfectly periodic • Pitch synchronous analysis gives better results • LPC order • Larger p gives lower prediction errors • Too large a p results in fitting the individual harmonics →separation between filter and source will not be so good

  31. Speech Signal Representations ILinear Predictive Coding • Prediction error • Inverse LPC filter gives residual signal

  32. Speech Signal Representations ILinear Predictive Coding • Alternatives for the predictor coefficients • Line Spectral Frequencies • local sensitivity • efficiency • Reflection Coefficients • Guaranteed stable → useful for coefficient interpolated over time • Log-area ratios • Flat spectral sensitivity • Roots of the polynomial • Represent resonance frequencies and bandwidths

  33. Speech Signal Representations ICepstral Processing • A homomorphic transformation converts a convolution into a sum:

  34. Speech Signal Representations ICepstral Processing

  35. Speech Signal Representations ICepstral Processing

More Related