1 / 37

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonolog

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology. Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA. Lecture 2: Acoustics of Vowel and Glide Production.

navid
Download Presentation

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonolog

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Landmark-Based Speech Recognition:Spectrogram Reading,Support Vector Machines,Dynamic Bayesian Networks,and Phonology Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA

  2. Lecture 2: Acoustics of Vowel and Glide Production • One-Dimensional Linear Acoustics • The Acoustic Wave Equation • Transmission Lines • Standing Wave Patterns • One-Tube Models • Schwa • Front cavity resonance of fricatives • Two-Tube Models • The vowel /a/ • Helmholtz Resonator • The vowels /u,i,e/ • Perturbation Theory • The vowels /u/, /o/ revisited • Glides

  3. 1. One-Dimensional Acoustic Wave Equation and Solutions

  4. Acoustics: Constitutive Equations

  5. Acoustic Plane Waves: Time Domain

  6. Acoustic Plane Waves: Frequency Domain Tex

  7. Solution for a Tube with Constant Area and Hard Walls

  8. 2. One-Tube Models

  9. Boundary Conditions L 0

  10. Resonant Frequencies

  11. Standing Wave Patterns

  12. Standing Wave Patterns: Quarter-Wave Resonators Tube Closed at the Left End, Open at the Right End

  13. Standing Wave Patterns: Half-Wave Resonators Tube Closed at Both Ends Tube Open at Both Ends

  14. Schwa and Invv (the vowels in “a tug”) F3=2500Hz=5c/4L F2=1500Hz=3c/4L F1=500Hz=c/4L

  15. Front Cavity Resonances of a Fricative /s/: Front Cavity Resonance = 4500Hz 4500Hz = c/4L if Front Cavity Length is L=1.9cm /sh/: Front Cavity Resonance = 2200Hz 2200Hz = c/4L if Front Cavity Length is L=4.0cm

  16. 3. Two-Tube Models

  17. Conservation of Mass at the Juncture of Two Tubes U2(x,t)= 2U1(x,t) U1(x,t) A2 = A1/2 A1 Total liters/second transmitted = (velocity) X (tube area)

  18. Two-Tube Model: Two Different Sets of Waves Incident Wave P1+ Reflected Wave P2+ Reflected Wave P1- Incident Wave P2-

  19. Two-Tube Model: Solution in the Time Domain

  20. Two-Tube Model in the Frequency Domain

  21. Approximate Solution of the Two-Tube Model, A1>>A2 LBACK LFRONT Approximate solution: Assume that the two tubes are completely decoupled, so that the formants include - F(BACK CAVITY) = c/4 LBACK - F(FRONT CAVITY) = c/4LFRONT

  22. The Vowels /AA/, /AH/ LBACK LFRONT LBACK=8.8cm  F2= c/4LBACK = 1000Hz LFRONT=12.6cm  F1= c/4LFRONT = 700Hz

  23. Acoustic Impedance Z(x,jW) x 0 Z(x,jW) x 0

  24. Low-Frequency Approximations of Acoustic Impedance

  25. Helmholtz Resonator  -Z1(x,jW) = Z2(x,jW) x 0 x 0

  26. The Vowel /i/ Back Cavity = Pharynx Resonances: 0Hz, 2000Hz, 4000Hz Front Cavity = Palatal Constriction Resonances: 0Hz, 2500Hz, 5000Hz Back Cavity Volume = 70cm3 Front Cavity Length/Area = 7cm-1  1/2p√MC = 250Hz Helmholtz Resonance replaces all 0Hz partial-tube resonances. 2500Hz 2000Hz 250Hz

  27. The Vowel /u/: A Two-Tube Model 2000Hz 1000Hz 250Hz Back Cavity = Mouth + Pharynx Resonances: 0Hz, 1000Hz, 2000Hz Front Cavity = Lips Resonances: 0Hz, 18000Hz, … Back Cavity Volume = 200cm3 Front Cavity Length/Area = 2cm-1  1/2p√MC = 250Hz Helmholtz Resonance replaces all 0Hz partial-tube resonances.

  28. The Vowel /u/: A Four-Tube Model Velar Tongue Body Constriction Lips Pharynx Mouth Two Helmholtz Resonators = Two Low-Frequency Formants! F1 = 250Hz F2 = 500Hz F3 = Pharynx resonance, c/2L = 2000Hz 2000Hz 500Hz 250Hz

  29. 4. Perturbation Theory

  30. Perturbation Theory(Chiba and Kajiyama, The Vowel, 1940) A(x) is constant everywhere, except for one small perturbation. Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation.

  31. Conservation of Energy Under Perturbation

  32. Conservation of Energy Under Perturbation

  33. “Sensitivity” Functions

  34. Sensitivity Functions for the Quarter-Wave Resonator (Lips Open) 0 x L /AA/ /ER/ /IY/ /W/

  35. Sensitivity Functions for the Half-Wave Resonator (Lips Rounded) 0 x L /L,OW/ /UW/

  36. Formant Frequencies of Vowels From Peterson & Barney, 1952

  37. Summary • Acoustic wave equation easiest to solve in frequency domain, for example: • Solve two boundary condition equations for P+ and P-, or • Solve the two-tube model (four equations in four unknowns) • Quarter-Wave Resonator: Open at one end, Closed at the other • Schwa or Invv (“a tug”) • Front cavity resonance of a fricative or stop • Half-Wave Resonator: Closed at the glottis, Nearly closed at the lips • /uw/ • Two-Tube Models • Exact solution: use reflection coefficient • Approximate solution: decouple the tubes, solve separately • Helmholtz Resonator • When the two-tube model seems to have resonances at 0Hz, use, instead, the Helmholtz Resonance frequency, computed with low-frequency approximations of acoustic impedance • /iy/: F1 is a Helmholtz Resonance • /uw/ and /ow/: Both F1 and F2 are Helmholtz Resonances • Perturbation Theory • Perturbed area  Perturbed formants • Sensitivity function explains most vowels and glides in one simple chart

More Related