1 / 32

Sarah Borys

Recognition of Prosodic Factors and Detection of Landmarks for Improvements to Continuous Speech Recognition. Sarah Borys. Motivation. This research attempts to address some of the problems in modern speech recognition through the use of prosody and acoustic landmarks. Speech Recognition.

platt
Download Presentation

Sarah Borys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognition of Prosodic Factors and Detection of Landmarks for Improvements to Continuous Speech Recognition Sarah Borys

  2. Motivation • This research attempts to address some of the problems in modern speech recognition through the use of prosody and acoustic landmarks.

  3. Speech Recognition problems associated with the proposal

  4. 5800000 6800000 seventy 6800000 13000000 six 5800000 6800000 s 6800000 7600000 eh 7600000 7900000 v 7900000 8300000 en 8300000 8600000 t 8600000 9300000 iy 9300000 10700000 s 10700000 11900000 ih 11900000 12200000 k 12200000 13000000 s Word and Phone Transcriptions

  5. Phonemes • Vowels aa ae iy ow ey uw • Consonants p t k b d g s sh z th n l

  6. Prosody • Changes word meaning. • Changes the way phones sound

  7. Prosody • Wanted. Chief justice of the Massachusetts supreme court.

  8. Prosody • Wanted. Chief justice of the Massachusetts supreme court • Intonation - [Wanted].[Chief justice of the Massachusetts supreme court].

  9. Prosody • Wanted. Chief justice of the Massachusetts supreme court • Intonation - [Wanted].[Chief justice of the Massachusetts supreme court]. • Pitch Accent - Wanted. Chief justice of the Massachusetts supreme court.

  10. Prosody • Wanted. Chief justice of the Massachusetts supreme court • Intonation - [Wanted].[Chief justice of the Massachusetts supreme court]. • Pitch Accent - Wanted. Chief justice of the Massachusetts supreme court. • Function - Wanted. Chief justice of the Massachusetts supreme court.

  11. Prosody • Wanted. Chief justice of the Massachusetts supreme court • Intonation - [Wanted].[Chief justice of the Massachusetts supreme court]. • Pitch Accent - Wanted. Chief justice of the Massachusetts supreme court. • Function - Wanted. Chief justice of the Massachusetts supreme court. • Co-Articulation - Wanted. Chief justice of the Massachusetts supreme court.

  12. Hidden Markov Models a11 a22 a33 1 2 3 a12 a23 a13

  13. Experiments • Phone splitting • Landmark detection

  14. Phone Splitting • Determine the log likelihoods of phonetic and prosodic splits • Compare the log likelihoods

  15. Prosodic Contexts • Independent • Phrase final • Phrase initial • Accent • Function

  16. Phonetic Contexts • Left fricative • Right fricative • Left stop • Right stop • Left nasal • Right nasal • Left liquid • Right liquid • Left vowel • Right vowel f ow n

  17. Log Likelihood Independent Log Likelihood = log p(O| λ1) O = [o1, o2, … oN] (short time spectra) λ = model parameters Context Log Likelihood = log [p([o1…oM] | λ2)p([oM+1…oN] | λ3)]

  18. Post Processing • Example: The phone “ow”

  19. Post Processing • Calculate the weighted average as follows: WA = 98(-832.69507) + 17(-1315.40808) 115 115

  20. Post Processing • If WA > LL; then split into prosodic allophones • If WA ≤ LL; then do not split into prosodic allophones -904.05264 > -986.11096

  21. Results • Almost all phonetic context splits result in log likelihood improvements for all phones. • Prosodic splits also cause the log likelihood to improve.

  22. Word Recognition Accuracy

  23. Experiments • Phone splitting • Landmark detection

  24. Distinctive Features • b [-sonorant, -continuant, +lips] • s [-sonorant, +continuant, +blade] • m [+sonorant, -continuant, +lips] • y [+sonorant, +continuant, +blade] • aa [+sonorant, +continuant, +syllabic, +low, -front]

  25. Distinctive Features and Landmarks • In order to correctly detect and recognize speech, the different manner features must be correctly recognized and classified. • In order to classify manner features correctly, landmarks need to be correctly identified.

  26. Landmarks problems associated with the proposal

  27. Distinctive Features and Landmarks • Speech • Consonantal • Continuant • Sonorant • Syllabic

  28. Support Vector Machines • Given a set of N observations y1 – yN • Each yi is associated with some xi€ Rn for 1 ≤ i ≤ N • Observation sets in the form of (xi, yi) are drawn from an unknown probability distribution P(xi, yi) • Is there an optimal function F, that can learn the mapping xi yi?

  29. Results

  30. Results

  31. Conclusions • The incorporation of prosody into the phoneme model leads to significant improvements in speech recognition rates. • Landmarks are detectable with high accuracy and have the potential to improve recognition rates.

  32. Future Work • Prosody and conversational speech • Prosody and landmark detection • Landmark detection and continuous speech recognition

More Related