ECE 598: The Speech Chain

ECE 598: The Speech Chain Lecture 8: Formant Transitions; Vocal Tract Transfer Function

Today • Perturbation Theory: • A different way to estimate vocal tract resonant frequencies, useful for consonant transitions • Syllable-Final Consonants: Formant Transitions • Vocal Tract Transfer Function • Uniform Tube (Quarter-Wave Resonator) • During Vowels: All-Pole Spectrum • Q • Bandwidth • Nasal Vowels: Sum of two transfer functions gives spectral zeros

Topic #1:Perturbation Theory

Perturbation Theory(Chiba and Kajiyama, The Vowel, 1940) A(x) is constant everywhere, except for one small perturbation. Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation.

Conservation of Energy Under Perturbation

“Sensitivity” Functions

Sensitivity Functions for the Quarter-Wave Resonator (Lips Open) 0 x L • Note: low F3 of /er/ is caused in part by a side branch under the tongue – perturbation alone is not enough to explain it. /AA/ /ER/ /IY/ /W/

Sensitivity Functions for the Half-Wave Resonator (Lips Rounded) 0 x L • Note: high F3 of /l/ is caused in part by a side branch above the tongue – perturbation alone is not enough to explain it. /L,OW/ /UW/

Formant Frequencies of Vowels From Peterson & Barney, 1952

Topic #2:Formant Transitions, Syllable-Final Consonant

Events in the Closure of a Nasal Consonant Formant Transitions Vowel Nasalization Nasal Murmur

Formant Transitions: A Perturbation Theory Model

“the mom” Formant Transitions: Labial Consonants “the bug”

“the supper” Formant Transitions: Alveolar Consonants “the tug”

“the shoe” Formant Transitions: Post-alveolar Consonants “the zsazsa”

“the gut” Formant Transitions: Velar Consonants “sing a song”

Topic #3:Vocal Tract Transfer Functions

Transfer Function • “Transfer Function” T(w)=Output(w)/Input(w) • In speech, it’s convenient to write T(w)=UL(w)/UG(w) • UL(w) = volume velocity at the lips • UG(w) = volume velocity at the glottis • T(0) = 1 • Speech recorded at a microphone = pressure • PR(w) = R(w)T(w)UG(w) • R(w) = jrf/r = “radiation characteristic” • r = density of air • r = distance to the microphone • f = frequency in Hertz

Transfer Function of an Ideal Uniform Tube • Ideal Terminations: • Reflection coefficient at glottis: zero velocity, g=1 • Reflection coefficient at lips: zero pressure, g=-1 • Obviously, this is an approximation, but it gives… T(w) = 1/cos(wL/c) = …(w+w3)(w+w2)(w+w1)(w-w1)(w-w2)(w-w3)… wn = npc/L – pc/2L Fn = nc/2L – c/4L w12w22w32…

Transfer Function of an Ideal Uniform Tube Peaks are actually infinite in height (figure is clipped to fit the display)

Transfer Function of a Non-Ideal Uniform Tube • Almost ideal terminations: • At glottis: velocity almost zero, g≈1 • At lips: pressure almost zero, g≈-1 T(w) = 1/(j/Q +cos(wL/c)) … at Fn=nc/2L – c/4L,… T(2pFn) = -jQ 20log10|T(2pFn)| = 20log10Q

Transfer Function of a Non-Ideal Uniform Tube

Transfer Function of a Vowel: Height of First Peak is Q1=F1/B1 (2pFn)2+(pBn)2 ∞ T(w) = P (jw+j2pFn+pBn)(jw-j2pFn+pBn) T(2pF1) ≈ (2pF1)2/(j4pF1pB1) = -jF1/B1 Call Qn = Fn/Bn T(2pF1) ≈ -jQ1 20log10|T(2pF1)| ≈ 20log10Q1 n=1

Transfer Function of a Vowel: Bandwidth of a Peak is Bn (2pFn)2+(pBn)2 T(w) = P (jw+j2pFn+pBn)(jw-j2pFn+pBn) T(2pF1+pB1) ≈ (2pF1)2/((j4pF1)(pB1+pB1)) = -jQ1/2 At f=F1+0.5Bn, |T(w)|=0.5Qn 20log10|T(w)| = 20log10Q1 – 3dB ∞ n=1

Amplitudes of Higher Formants: Include the Rolloff (2pFn)2+(pBn)2 ∞ T(w) = P (jw+j2pFn+pBn)(jw-j2pFn+pBn) At f above F1 T(2pf) ≈ (F1/f) T(2pF2) ≈ (-jF2/B2)(F1/F2) 20log10|T(2pF2)| ≈ 20log10Q2 – 20log10(F2/F1) 1/f Rolloff: 6 dB per octave (per doubling of frequency) n=1

Vowel Transfer Function: Synthetic Example L1 = 20log10(500/80)=16dB L2 = 20log10(1500/240) – 20log10(F2/F1) = 16dB – 9.5dB L3 = 20log10(2500/600) – 20log10(F3/F1) – 20log10(F3/F2) B2 = 240Hz B1 = 80Hz B3 = 600Hz? (hard to measure because rolloff from F1, F2 turns the F3 peak into a plateau) F4 peak completely swamped by rolloff from lower formants

Shorthand Notation for the Spectrum of a Vowel snsn* ∞ T(s) = P (s-sn)(s-sn*) s = jw sn = -pBn+j2pFn sn* = -pBn-j2pFn snsn* = |sn|2 = (2pFn)2+(pBn)2 T(0) = 1 20log10|T(0)| = 0dB n=1

Another Shorthand Notation for the Spectrum of a Vowel 1 ∞ T(s) = P (1-s/sn)(1-s/sn*) n=1

Topic #4:Nasalized Vowels

Vowel Nasalization Nasalized Vowel Nasal Consonant

Nasalized Vowel PR(w) = R(w)(UL(w)+UN(w)) UN(w) = Volume Velocity from Nostrils PR(w) = R(w)(TL(w)+TN(w))UG(w) = R(w)T(w)UG(w) T(w) = TL(w) + TN(w)

Nasalized Vowel T(s) = TL(s)+TN(s) = (1-s/sLn)(1-s/sLn*) + (1-s/sNn)(1-s/sNn*) = (1-s/sLn)(1-s/sLn*)(1-s/sNn)(1-s/sNn*) 1/sZn = ½(1/sLn+1/sNn) sZn = nth spectral zero T(s) = 0 if s=sZn 1 1 2(1-s/sZn)(1-s/sZn*)

The “Pole-Zero Pair” 20log10T(w) = 20log10(1/(1-s/sLn)(1-s/sLn*)) + 20log10((1-s/sZn)(1-s/sZn*)/(1-s/sNn)(1-s/sNn*)) = original vowel log spectrum + log spectrum of a pole-zero pair

Additive Terms in the Log Spectrum

Transfer Function of a Nasalized Vowel

Pole-Zero Pairs in the Spectrogram Nasal Pole Zero Oral Pole

Summary • Perturbation Theory: • Squeeze near a velocity peak: formant goes down • Squeeze near a pressure peak: formant goes up • Formant Transitions • Labial closure: loci near 250, 1000, 2000 Hz • Alveolar closure: loci near 250, 1700, 3000 Hz • Velar closure: F2 and F3 come together (“velar pinch”) • Vocal Tract Transfer Function • T(s) = P snsn*/(s-sn)(s-sn*) • T(w=2pFn) = Qn = Fn/Bn • 3dB bandwidth = Bn Hertz • T(0) = 1 • Nasal Vowels: • Sum of two transfer functions gives a spectral zero between the oral and nasal poles • Pole-zero pair is a local perturbation of the spectrum

ECE 598: The Speech Chain