1 / 36

ECE 598: The Speech Chain

ECE 598: The Speech Chain. Lecture 7: Fourier Transform; Speech Sources and Filters. Today. Fourier Series (Discrete Fourier Transform) Sawtooth wave example Finding the coefficients Spectrogram Frequency Response Speech Source-Filter Theory Speech Sources Transient Frication

Download Presentation

ECE 598: The Speech Chain

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters

  2. Today • Fourier Series (Discrete Fourier Transform) • Sawtooth wave example • Finding the coefficients • Spectrogram • Frequency Response • Speech Source-Filter Theory • Speech Sources • Transient • Frication • Aspiration • Voicing • Formant Transitions

  3. Topic #1: Discrete Fourier Transform

  4. Sawtooth Wave • Sawtooth first harmonic: • Amplitude: 0.64 • Phase: 0.52p radians

  5. Sawtooth Second Harmonic • Sawtooth second harmonic: • Amplitude: 0.32 • Phase: 0.53p radians

  6. Sawtooth Third Harmonic • Sawtooth third harmonic: • Amplitude: 0.21 • Phase: 0.55p radians

  7. Sawtooth: Calculating the Fourier Transform • Average over one full period of (125Hz sawtooth * 125Hz cosine) = -0.03

  8. Sawtooth: Calculating the Fourier Transform • Average over one full period of (125Hz sawtooth * 125Hz sine) = 0.636

  9. Sawtooth: Amplitude and Phase of the First Harmonic • First Fourier Coefficient of the Sawtooth Wave: • -0.03 + 0.636j • |-0.03 + 0.636j| = 0.64 • phase(-0.03 + 0.636j) = 0.52p • X1 = -0.03+0.636j = 0.64 ej0.52p • First Harmonic: • h1(t) = 0.64 ej(250p t+0.52p) • Re{h1(t)} = 0.64 cos(250pt+0.52p)

  10. Sawtooth: Calculating the Fourier Transform • Average over one full period of (125Hz sawtooth * 250Hz cosine) = -0.03

  11. Sawtooth: Calculating the Fourier Transform • Average over one full period of (125Hz sawtooth * 250Hz sine) = -0.3173

  12. Sawtooth: Amplitude and Phase of the Second Harmonic • Second Fourier Coefficient of the Sawtooth Wave: • -0.03 + 0.3173j • |-0.03 + 0.3173j| = 0.32 • phase(-0.03 + 0.3173j) = 0.53p • X1 = -0.03+0.3173j = 0.32 ej0.53p • First Harmonic: • h2(t) = 0.32 ej(500p t+0.53p) • Re{h2(t)} = 0.64 cos(250pt+0.53p)

  13. How to Reconstruct a Sawtooth from its Harmonics • At sampling frequency of FS=8000Hz (8000 samples/second), the period of a 125Hz sawtooth is • (8000 samples/second)/(125 cycles/second) = 64 samples • The “0”th harmonic is DC (0*125Hz = 0 Hz) • So we need to add up 63 harmonics: • x(t) = h1(t) + h2(t) + … + h63(t)

  14. Why Does This Work??? • It’s a property of sines and cosines • They are a BASIS SET, which means that… • Let <> mean “average over one full period:” <cos2(mw0t)> = 0.5 for any integer m <cos(mw0t)cos(nw0t)> = 0 if m ≠ n above two properties also hold for sine <cos(mw0t) sin(nw0t)> = 0 • w0 = 2pF0 = 2p/T0 is the “fundamental frequency” in radians/second

  15. Topic #2: Spectrogram

  16. Spectrogram Frequency (Hz) Time (seconds)

  17. How to Create a Spectrogram • Divide the waveform into overlapping window • Example: window length = 6ms • Example: window spacing = one per 2ms • Result: neighboring windows overlap by 4ms • Fourier transform each frame to create the “short-time Fourier transform” Xk(t) • Read: kth frequency bin, taken from the frame at time t seconds • Could also write X(t,f) = Fourier coefficient from the frame at time=t seconds, from the frequency bin centered at f Hz. • In pixel (k,t), plot 20*log10(abs(Xk(t)) • The Fourier transform expressed in decibels!

  18. Reading a Spectrogram Turbulence (Frication or Aspiration) Stop Nasals Vowel Frequency (Hz) Time (seconds) Glide (Extreme Formant Frequencies) Each vertical stripe is a glottal closure instant (BANG – high energy at all frequencies) Inter-bang period = pitch period ≈ 10 ms

  19. Topic #3: Frequency ResponseandSpeech Source-Filter Theory

  20. Frequency Response • Any sound can be windowed • Call the window length “N” • Its windows can be Fourier transformed: x(t) = h1(t) + h2(t) + … x(t) = X1ejw0t + X2e2jw0t + … • What happens when you filter a cosine? Input = ejw0t --- Output = H(w0)ejw0t Input = e2jw0t --- Output = H(2w0)e2jw0t …

  21. Frequency Response • If x(t) goes through any filter: x(t) → ANY FILTER → x(t) • Then we can find y(t) using freq response! x(t) = X1ejw0t + X2e2jw0t + … y(t) = H(w0)X1ejw0t + H(2w0)X2e2jw0t + …

  22. Frequency Response • In general, we can write • Y(w) = H(w)X(w)

  23. Examples of Filters • Low-Pass Filter: • Eliminates all frequency components above some cutoff frequency • Result sounds muffled • High-Pass Filter: • Eliminates all frequency components below some cutoff frequency • Result sounds fizzy • Band-Pass Filter: • Eliminates all components except those between f1 and f2 • Result: sounds like an old radio broadcast • Band-Stop Filter: • Eliminates only the frequency components between f1 and f2 • Result: usually hard to hear that anything’s happened!

  24. Examples of Filters • A Room: • Adds echoes to the signal • A Quarter-Wave Resonator: • Emphasizes frequency components at f=(c/4L)+(nc/2L), for all integer n • Components at other frequencies are passed without any emphasis • A Vocal Tract: • Emphasizes any part of the input that is near a formant frequency • Any energy at other frequencies is passed without emphasis

  25. The Source-Filter Model of Speech Production • Input: Weak Sources • Glottal closure • Turbulence (frication, aspiration) • Filter: The Vocal Tract • Energy near formant frequency is emphasized by a factor of 10 or a factor of 100 • Energy at other frequencies is passed without emphasis • Speech = (Vocal Tract Transfer Function) X (Excitation) • S(w) = T(w) E(w)

  26. Topic #4: Speech Sources.Presented as:Events in the Release of a Stop Consonant

  27. Events in the Release of a Stop “Burst” = transient + frication (the part of the spectrogram whose transfer function has poles only at the front cavity resonance frequencies, not at the back cavity resonances).

  28. Pre-voicing during Closure To make a voiced stop in most European languages: Tongue root is relaxed, allowing it to expandm so that vocal folds can continue to vibrating for a little while after oral closure. Result is a low-frequency “voice bar” that may continue well into closure. In English, closure voicing is typical of read speech, but not casual speech. “the bug”

  29. Burst = Transient and FricationFrication = Turbulence at a Constriction Turbulence striking an obstacle makes noise Front cavity resonance frequency: FR = c/4Lf The fricative “sh” (this ship)

  30. Aspiration = Turbulence at the Glottis (like /h/). Transfer Function During Aspiration…

  31. “the mom” Formant Transitions: Labial Consonants “the bug”

  32. “the supper” Formant Transitions: Alveolar Consonants “the tug”

  33. “the shoe” Formant Transitions: Post-alveolar Consonants “the zsazsa”

  34. “the gut” Formant Transitions: Velar Consonants “sing a song”

  35. Formant Transitions: A Perceptual Study The study: (1) Synthesize speech with different formant patterns, (2) record subject responses. Delattre, Liberman and Cooper, J. Acoust. Soc. Am. 1955.

  36. Perception of Formant Transitions

More Related