1 / 40

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonolog

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology. Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA. Lecture 3: Spectral Dynamics and the Production of Consonants.

dino
Download Presentation

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonolog

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Landmark-Based Speech Recognition:Spectrogram Reading,Support Vector Machines,Dynamic Bayesian Networks,and Phonology Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA

  2. Lecture 3: Spectral Dynamics and the Production of Consonants • International Phonetic Alphabet • Events in the Closure of a Nasal Consonant • Formant transitions: a perturbation model • Nasalized vowel • Nasal murmur • Events in the Release of a Stop Consonant • Pre-voicing (voiced stops in carefully read English) • Transient (stops and affricates) • Frication (stops, affricates, and fricatives) • Aspiration (aspirated stops and /h/) • Formant Transitions (any consonant-vowel transition) • Formant Tracking • Does it help Speech Recognition? • Methods for Vowels, and for Aspiration & Nasals • Reminder – lab 1 due Monday!

  3. International Phonetic Alphabet: Purpose and Brief History • Purpose of the alphabet: to provide a universal notation for the sounds of the world’s languages • “Universal” = If any language on Earth distinguishes two phonemes, IPA must also distinguish them • “Distinguish” = Meaning of a word changes when the phoneme changes, e.g. “cat” vs. “bat.” • Very Brief History: • 1876: Alexander Bell publishes a distinctive-feature-based phonetic notation in “Visible Speech: The Science of the Universal Alphabetic.” His notation is rejected as being too expensive to print • 1886: International Phonetic Association founded in Paris by phoneticians from across Europe • 1991: Unicode provides a standard method for including IPA notation in computer documents

  4. International Phonetic Alphabet: Vowels Pinyin ARPABET (Approx.) / u (zhu)/ UW o UH / oa/ OW / oAH / AO a (ma)AA Pinyin ARPABET (Approx.) i /u (xu) IY / UX EY EH a (zhang)AE a (ma) Pinyin:e ARPA:AX

  5. IPA: Regular Consonants Tongue Body Tongue Blade Q NG DX HH/HV R Y ARPABET: F/V (labiodental), TH/DH (dental), S/Z (alveolar), SH/ZH (postalveolar or palatal) Pinyin: s (alveolar), x (postalveolar), sh/r (retroflex)

  6. Affricates and Doubly-Articulated Consonants ARPABET WH W Affricates in English and Chinese: Pinyin ARPABET IPA Alveolar: c/z ts/dz Post-alveolar: q/jCH/JH tʃ/dʒ Retroflex: ch/zh ţş/ɖʐ

  7. Non-Pulmonic Consonants

  8. Events in the Closure of a Syllable-Final Nasal Consonant

  9. Events in the Closure of a Nasal Consonant Formant Transitions Vowel Nasalization Nasal Murmur

  10. Formant Transitions: A Perturbation Theory Model

  11. “the mom” Formant Transitions: Labial Consonants “the bug”

  12. “the supper” Formant Transitions: Alveolar Consonants “the tug”

  13. “the shoe” Formant Transitions: Post-alveolar Consonants “the zsazsa”

  14. “the gut” Formant Transitions: Velar Consonants “sing a song”

  15. Formant Transitions: A Perceptual Study The study: (1) Synthesize speech with different formant patterns, (2) record subject responses. Delattre, Liberman and Cooper, J. Acoust. Soc. Am. 1955.

  16. Perception of Formant Transitions: Conclusions

  17. Vowel Nasalization

  18. Vowel Nasalization

  19. Additive Terms in the Log Spectrum

  20. Transfer Function of a Nasalized Vowel

  21. Nasal Murmur “the mug” “the nut” “sing a song” Observations: Low-frequency resonance (about 300Hz) always present Low-frequency resonance has wide bandwidth (about 150Hz) Energy of low-frequency resonance is very constant Most high-frequency resonances cancelled by zeros Different places of articulation have different high frequency spectra High-frequency spectrum is talker-dependent and variable

  22. Resonances of a Nasal Consonant Reference: Fujimura, JASA 1962

  23. Anti-Resonances of a Nasal Consonant

  24. Events in the Release of a Stop (Plosive) Consonant

  25. Events in the Release of a Stop “Burst” = transient + frication (the part of the spectrogram whose transfer function has poles only at the front cavity resonance frequencies, not at the back cavity resonances).

  26. Events in the Release of a Stop Transient Frication Aspiration Voicing Aspirated (/t/) Unaspirated (/b/)

  27. Pre-voicing during Closure To make a voiced stop in most European languages: Tongue root is relaxed, allowing it to expandm so that vocal folds can continue to vibrating for a little while after oral closure. Result is a low-frequency “voice bar” that may continue well into closure. In English, closure voicing is typical of read speech, but not casual speech. “the bug”

  28. Transient: The Release of Pressure

  29. Transfer Function During Transient and Frication: Poles Turbulence striking an obstacle makes noise Front cavity resonance frequency: FR = c/4Lf

  30. Transfer Function During Frication: An Important Zero

  31. Transfer Function During Frication: An Important Zero

  32. Transfer Function During Aspiration

  33. Are Formant Frequencies Useful for Speech Recognition? • Kopec and Bush (1992): WER(formants alone) > WER(cepstrum alone) > WER(formants and cepstrum together) • How should we track formants? • In vowels: Autoregressive (AR) modeling (also known as LPC) • In aspiration, nasals: Autoregressive Moving Average (ARMA) modeling. Problem: no closed-form solution • In aspiration, nasals: Exponentially Weighted Autoregressive (EWAR; Zheng and Hasegawa-Johnson, ICASSP 2004)

  34. Formant Tracking for Vowels: Autoregressive Model (LPC)

  35. Formant Tracking for Aspiration: “Auto-Regressive Moving Average” Model (ARMA)

  36. Formant Tracking for Aspiration: “Exponentially Weighted Auto-Regressive” Model (EWAR)(Zheng and Hasegawa-Johnson, ICSLP 2004)

  37. Solving the EWAR Model

  38. Results: Stop Classification, MFCC alone vs. MFCC+formants

  39. Results: Stop Classification, MFCC alone vs. MFCC+formants

  40. Summary • International Phonetic Alphabet: • Useful on any computer with unicode • International encoding for all sounds of the world’s languages • Events in a nasal closure: • Formant transitions (perturbation model) • Vowel nasalization (sum of TFs) • Nasal murmur (impedance match at juncture) • Events in release of a stop: • Pre-voicing in English voiced stops (read speech) • Transient (dp/dt ~ dA/dt) • Frication ((zero at f=0)/(front cavity resonances)) • Aspiration ((zero at f=0)/(same poles as the vowel)) • Formant tracking • In a vowel: use LPC • In aspiration, frication, or nasal murmur: ARMA is theoretically optimum, but computationally expensive • Aspiration etcetera: EWAR can be a good approximation to ARMA

More Related