1 / 15

Detection of Vowel Onset Point in Speech

Detection of Vowel Onset Point in Speech. S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute of Technology Madras, India Email: {prasanna,jinu}@cs.iitm.ernet.in http://speech.cs.iitm.ernet.in. 1. Objective & Organization.

verena
Download Presentation

Detection of Vowel Onset Point in Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute of Technology Madras, India Email: {prasanna,jinu}@cs.iitm.ernet.in http://speech.cs.iitm.ernet.in 1

  2. Objective & Organization • Significance of VOP for speech analysis • Issues in VOP detection • Acoustic-phonetic description of VOP • Acoustic cues for VOP detection • Detection of VOP in speech • Begin-end detection using VOP • Summary and conclusions 2

  3. Significance of VOP for Speech Analysis • Syllabification • Begin-end detection • Speech recognition • Speech enhancement • Vowel/Non-vowel classification 3

  4. Issues in VOP Detection • CVs with voiced consonants • Nasals, semivowels and aspirated sounds 4

  5. Acoustic-Phonetic Description of VOP • Changes in excitation source and vocal tract system characteristics • C and V regions in terms of acoustic features Acoustic Cues for VOP Detection • Formant transition (Ftr(t)) • Epoch intervals (Ei(t)) • Strength of instants (Si(t)) • Itakura distance (Id(t)) • Ratio of signal energy to residual energy (Sr(t)) 5

  6. Manual VOP Detection 6

  7. VOP Detection in Isolated CVs Performance 7

  8. VOP Detection in Continuous Speech 8

  9. Performance of VOP Detection Algorithm Clean Speech Degraded Speech 9

  10. Begin-End Detection using VOP 10

  11. Summary & Conclusions • Automatic VOP detection algorithm using source features • Acoustic-phonetic description for VOP • Acoustic cues for VOP detection based on instants of significant excitation • Begin-end detection using the knowledge of VOP • Need for a method to combine spectral features with proposed source features for robust detection of VOP 11

  12. Paper # 1410Detection of Vowel Onset Point in Speech

  13. S.R. Mahadeva Prasanna & Jinu Mariam Zachariah

  14. Department of Computer Science & Engineering Indian Institute of Technology Madras, India Email: {prasanna,jinu}@cs.iitm.ernet.in http://speech.cs.iitm.ernet.in

  15. ABSTRACT Sound units in many languages are syllabic in nature, and frequently used syllables are of consonant-vowel (CV) type. Vowel onset point (VOP) is an important event in CV units. Knowledge of VOPs helps in many applications such as speech recognition, speaker recognition, speech enhancement, begin-end detection, segmentation of speech into vowel/nonvowel-like units and finding duration of vowels. In this paper we describe parameters or features useful for manually identifying the VOPs for different types of CV units. An automatic algorithm is proposed for detecting VOPs in continuous speech, which is motivated by the nature of production and perception of speech. Speech signal is a result of exciting a time varying vocal tract system with time varying excitation. Changes in the source and system characteristics around the VOP are both useful for the detection of VOPs. In this paper we use the changes in the source characteristics for detecting the VOPs. The performance of the proposed algorithm is evaluated using 25 sentences for which a total of 236 VOPs have been identified manually. It is found that 216 VOPs have been detected within a resolution of +/- 30 ms. Compared to the energy-based approach, VOP-based begin-end detection has significantly improved the performance in the case of text-dependent speaker verification system. For telephone database of 32 speakers consisting of 480 genuine utterances and 1984 impostor utterances, the performance of the system has improved from an equal error rate (EER) of 5.9% to 2.6%.

More Related