1 / 17

Stress Detection

Stress Detection. J.-S. Roger Jang ( 張智星 ) MIR Lab , CSIE Dept., National Taiwan Univ. http://mirlab.org/jang. Intro to Stress Detection. Stress detection (SD) for English Given an English word and its pronunciation Detect the stress position of the pronunciation Applications

Download Presentation

Stress Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stress Detection J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept., National Taiwan Univ. http://mirlab.org/jang

  2. Intro to Stress Detection • Stress detection (SD) for English • Given an English word and its pronunciation • Detect the stress position of the pronunciation • Applications • Computer-assisted pronunciation training (CAPT) • Similar to… • Tone recognition in Mandarin Chinese • Intonation scoring

  3. Examples of Stress in English Words • For multi-syllablic English word, there is a stressed syllable • Example • Dictionary: stressed at syllable 1 • Tomorrow: stressed at syllable 2 • International: stressed at syllable 3

  4. Steps in Stress Detection • Preprocessing • Use forced alignment to find vowel locations • Feature extraction • Extract feature for each vowel • Model construction • Build a classifier for vowel-based stress detection • Post processing • Create a word-based stress detection

  5. Forced Alignment (1/2) • A process used for align an utterance to the corresponding canonical phonetic alphabets • Example: International

  6. Forced Alignment (2/2) • Applications of forced alignment • Speech scoring (based on timber only) • Utterance verification • Our forced alignment engine • ASRA (Automatic Speech Recognition & Assessment): For voice command recognition and speech assessment (scoring)

  7. Corpora for Stress Detection • Merriam Webster dictionary • Website • Some statistics • # pronunciations: 21950 • Usable files: 14994 • No. of syllables > 1 • Available in our dictionary • Valid output from ASRA • In-house recordings • Recordings from MSAR for several years • Available upon request

  8. Speech Corpus for Lexical Stress Detection • Merriam Webster Online Dictionary’s Lexical Pronunciation • http://www.merriam-webster.com • All utterance are pronunciated by Native Speakers

  9. Stress Detection based on Vowel Classification • SD is based on vowel classification due to the following observations • Each word has a stressed syllable • Each syllable is usually composed of a consonant and a vowel • Vowels are always voiced (have pitch) • Therefore • Each vowel is classified into “unstressed” or “stressed” • To determine stressed syllable in an utterance • Max likelihood of the class “Stressed” • Min likelihood of the class “Unstressed” • Difference of the above two

  10. Features for vowels • Vowel-based features • Pitch: min, mean, max, range, std, slope, etc. • Volume: min, mean, max, range, std, slope, etc. • Duration (normalized by speech rate) • Legendre polynomial fitting for pitch & volume • Spectral emphasized version of the above • …

  11. Lexical Stress Detection – Experiment 1 10-fold Cross Validation Classifier: SVM Feature Set E :Root Mean Square Energy D : Duration P : Pitch S :Root Mean Square Spectral Emphasis Energy PS: Pitch Slope CE: Legendre Coefficient of Root Mean Square Energy Contour CP: Legendre Coefficient of Pitch Contour CS: Legendre Coefficient of Spectral Emphasis Energy Contour

  12. Lexical Stress Detection – Experiment 2 10-fold Cross Validation Classifier: SVM Syllable Number-Independent Classifier vs. Syllable Number-dependent Classifier

  13. Lexical Stress Detection – Experiment 3 10-fold Cross Validation GMMC: Gaussian Mixture Model Classifier NBC: Naïve Bayes Classifier QC: Quadratic Classifier SVMC: Support Vector Machine Classifier

  14. Lexical Stress Detection – Error Analysis • Error Types: • Wrong ground truth / More than 1 pronunciations of the word • conduct2[kənˋdʌkt] / [ˋkɑndʌkt] • Complex Word with 2 primary stressed syllables • worldwide2[`wɝld`waɪd] • histochemistry5[ˋhɪstəˋkɛmɪstrɪ] • Word with Primary stressed and Secondary stressed syllable • deposition4[͵dɛpəˋzɪʃən] • cafeteria5[͵kæfəˋtɪrɪə]

  15. Lexical Stress Detection – Error Analysis • Error Types: • Wrong result from Pitch Tracking • elegant3[ˋɛləgənt] • Wrong result from Forced Alignment • peremptory4[pəˋrɛmptərɪ]

  16. More on Stress Detection • ASRA • Chapter 20 of online tutorial on Audio Signal Processing • Demo • Recognition • goDemoVc.m in ASR • Web • Assessment • goDemoSa.m in ASR • Web • Stress detection • Application note • Demo

More Related