1 / 38

ECE 598: The Speech Chain

ECE 598: The Speech Chain. Lecture 8: Formant Transitions; Vocal Tract Transfer Function. Today. Perturbation Theory: A different way to estimate vocal tract resonant frequencies, useful for consonant transitions Syllable-Final Consonants: Formant Transitions Vocal Tract Transfer Function

Download Presentation

ECE 598: The Speech Chain

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 598: The Speech Chain Lecture 8: Formant Transitions; Vocal Tract Transfer Function

  2. Today • Perturbation Theory: • A different way to estimate vocal tract resonant frequencies, useful for consonant transitions • Syllable-Final Consonants: Formant Transitions • Vocal Tract Transfer Function • Uniform Tube (Quarter-Wave Resonator) • During Vowels: All-Pole Spectrum • Q • Bandwidth • Nasal Vowels: Sum of two transfer functions gives spectral zeros

  3. Topic #1:Perturbation Theory

  4. Perturbation Theory(Chiba and Kajiyama, The Vowel, 1940) A(x) is constant everywhere, except for one small perturbation. Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation.

  5. Conservation of Energy Under Perturbation

  6. Conservation of Energy Under Perturbation

  7. “Sensitivity” Functions

  8. Sensitivity Functions for the Quarter-Wave Resonator (Lips Open) 0 x L • Note: low F3 of /er/ is caused in part by a side branch under the tongue – perturbation alone is not enough to explain it. /AA/ /ER/ /IY/ /W/

  9. Sensitivity Functions for the Half-Wave Resonator (Lips Rounded) 0 x L • Note: high F3 of /l/ is caused in part by a side branch above the tongue – perturbation alone is not enough to explain it. /L,OW/ /UW/

  10. Formant Frequencies of Vowels From Peterson & Barney, 1952

  11. Topic #2:Formant Transitions, Syllable-Final Consonant

  12. Events in the Closure of a Nasal Consonant Formant Transitions Vowel Nasalization Nasal Murmur

  13. Formant Transitions: A Perturbation Theory Model

  14. “the mom” Formant Transitions: Labial Consonants “the bug”

  15. “the supper” Formant Transitions: Alveolar Consonants “the tug”

  16. “the shoe” Formant Transitions: Post-alveolar Consonants “the zsazsa”

  17. “the gut” Formant Transitions: Velar Consonants “sing a song”

  18. Topic #3:Vocal Tract Transfer Functions

  19. Transfer Function • “Transfer Function” T(w)=Output(w)/Input(w) • In speech, it’s convenient to write T(w)=UL(w)/UG(w) • UL(w) = volume velocity at the lips • UG(w) = volume velocity at the glottis • T(0) = 1 • Speech recorded at a microphone = pressure • PR(w) = R(w)T(w)UG(w) • R(w) = jrf/r = “radiation characteristic” • r = density of air • r = distance to the microphone • f = frequency in Hertz

  20. Transfer Function of an Ideal Uniform Tube • Ideal Terminations: • Reflection coefficient at glottis: zero velocity, g=1 • Reflection coefficient at lips: zero pressure, g=-1 • Obviously, this is an approximation, but it gives… T(w) = 1/cos(wL/c) = …(w+w3)(w+w2)(w+w1)(w-w1)(w-w2)(w-w3)… wn = npc/L – pc/2L Fn = nc/2L – c/4L w12w22w32…

  21. Transfer Function of an Ideal Uniform Tube Peaks are actually infinite in height (figure is clipped to fit the display)

  22. Transfer Function of a Non-Ideal Uniform Tube • Almost ideal terminations: • At glottis: velocity almost zero, g≈1 • At lips: pressure almost zero, g≈-1 T(w) = 1/(j/Q +cos(wL/c)) … at Fn=nc/2L – c/4L,… T(2pFn) = -jQ 20log10|T(2pFn)| = 20log10Q

  23. Transfer Function of a Non-Ideal Uniform Tube

  24. Transfer Function of a Vowel: Height of First Peak is Q1=F1/B1 (2pFn)2+(pBn)2 ∞ T(w) = P (jw+j2pFn+pBn)(jw-j2pFn+pBn) T(2pF1) ≈ (2pF1)2/(j4pF1pB1) = -jF1/B1 Call Qn = Fn/Bn T(2pF1) ≈ -jQ1 20log10|T(2pF1)| ≈ 20log10Q1 n=1

  25. Transfer Function of a Vowel: Bandwidth of a Peak is Bn (2pFn)2+(pBn)2 T(w) = P (jw+j2pFn+pBn)(jw-j2pFn+pBn) T(2pF1+pB1) ≈ (2pF1)2/((j4pF1)(pB1+pB1)) = -jQ1/2 At f=F1+0.5Bn, |T(w)|=0.5Qn 20log10|T(w)| = 20log10Q1 – 3dB ∞ n=1

  26. Amplitudes of Higher Formants: Include the Rolloff (2pFn)2+(pBn)2 ∞ T(w) = P (jw+j2pFn+pBn)(jw-j2pFn+pBn) At f above F1 T(2pf) ≈ (F1/f) T(2pF2) ≈ (-jF2/B2)(F1/F2) 20log10|T(2pF2)| ≈ 20log10Q2 – 20log10(F2/F1) 1/f Rolloff: 6 dB per octave (per doubling of frequency) n=1

  27. Vowel Transfer Function: Synthetic Example L1 = 20log10(500/80)=16dB L2 = 20log10(1500/240) – 20log10(F2/F1) = 16dB – 9.5dB L3 = 20log10(2500/600) – 20log10(F3/F1) – 20log10(F3/F2) B2 = 240Hz B1 = 80Hz B3 = 600Hz? (hard to measure because rolloff from F1, F2 turns the F3 peak into a plateau) F4 peak completely swamped by rolloff from lower formants

  28. Shorthand Notation for the Spectrum of a Vowel snsn* ∞ T(s) = P (s-sn)(s-sn*) s = jw sn = -pBn+j2pFn sn* = -pBn-j2pFn snsn* = |sn|2 = (2pFn)2+(pBn)2 T(0) = 1 20log10|T(0)| = 0dB n=1

  29. Another Shorthand Notation for the Spectrum of a Vowel 1 ∞ T(s) = P (1-s/sn)(1-s/sn*) n=1

  30. Topic #4:Nasalized Vowels

  31. Vowel Nasalization Nasalized Vowel Nasal Consonant

  32. Nasalized Vowel PR(w) = R(w)(UL(w)+UN(w)) UN(w) = Volume Velocity from Nostrils PR(w) = R(w)(TL(w)+TN(w))UG(w) = R(w)T(w)UG(w) T(w) = TL(w) + TN(w)

  33. Nasalized Vowel T(s) = TL(s)+TN(s) = (1-s/sLn)(1-s/sLn*) + (1-s/sNn)(1-s/sNn*) = (1-s/sLn)(1-s/sLn*)(1-s/sNn)(1-s/sNn*) 1/sZn = ½(1/sLn+1/sNn) sZn = nth spectral zero T(s) = 0 if s=sZn 1 1 2(1-s/sZn)(1-s/sZn*)

  34. The “Pole-Zero Pair” 20log10T(w) = 20log10(1/(1-s/sLn)(1-s/sLn*)) + 20log10((1-s/sZn)(1-s/sZn*)/(1-s/sNn)(1-s/sNn*)) = original vowel log spectrum + log spectrum of a pole-zero pair

  35. Additive Terms in the Log Spectrum

  36. Transfer Function of a Nasalized Vowel

  37. Pole-Zero Pairs in the Spectrogram Nasal Pole Zero Oral Pole

  38. Summary • Perturbation Theory: • Squeeze near a velocity peak: formant goes down • Squeeze near a pressure peak: formant goes up • Formant Transitions • Labial closure: loci near 250, 1000, 2000 Hz • Alveolar closure: loci near 250, 1700, 3000 Hz • Velar closure: F2 and F3 come together (“velar pinch”) • Vocal Tract Transfer Function • T(s) = P snsn*/(s-sn)(s-sn*) • T(w=2pFn) = Qn = Fn/Bn • 3dB bandwidth = Bn Hertz • T(0) = 1 • Nasal Vowels: • Sum of two transfer functions gives a spectral zero between the oral and nasal poles • Pole-zero pair is a local perturbation of the spectrum

More Related