200 likes | 307 Views
Stress Detection. J.-S. Roger Jang ( 張智星 ) MIR Lab , CSIE Dept., National Taiwan Univ. http://mirlab.org/jang. Intro to Stress Detection. Stress detection (SD) for English Given an English word and its pronunciation Detect the stress position of the pronunciation Applications
E N D
Stress Detection J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept., National Taiwan Univ. http://mirlab.org/jang
Intro to Stress Detection • Stress detection (SD) for English • Given an English word and its pronunciation • Detect the stress position of the pronunciation • Applications • Computer-assisted pronunciation training (CAPT) • Similar to… • Tone recognition in Mandarin Chinese • Intonation scoring
Examples of Stress in English Words • For multi-syllablic English word, there is a stressed syllable • Example • Dictionary: stressed at syllable 1 • Tomorrow: stressed at syllable 2 • International: stressed at syllable 3
Steps in Stress Detection • Preprocessing • Use forced alignment to find vowel locations • Feature extraction • Extract feature for each vowel • Model construction • Build a classifier for vowel-based stress detection • Post processing • Create a word-based stress detection
Forced Alignment (1/2) • A process used for align an utterance to the corresponding canonical phonetic alphabets • Example: International
Forced Alignment (2/2) • Applications of forced alignment • Speech scoring (based on timber only) • Utterance verification • Our forced alignment engine • ASRA (Automatic Speech Recognition & Assessment): For voice command recognition and speech assessment (scoring)
Corpora for Stress Detection • Merriam Webster dictionary • Website • Some statistics • # pronunciations: 21950 • Usable files: 14994 • No. of syllables > 1 • Available in our dictionary • Valid output from ASRA • In-house recordings • Recordings from MSAR for several years • Available upon request
Speech Corpus for Lexical Stress Detection • Merriam Webster Online Dictionary’s Lexical Pronunciation • http://www.merriam-webster.com • All utterance are pronunciated by Native Speakers
Stress Detection based on Vowel Classification • SD is based on vowel classification due to the following observations • Each word has a stressed syllable • Each syllable is usually composed of a consonant and a vowel • Vowels are always voiced (have pitch) • Therefore • Each vowel is classified into “unstressed” or “stressed” • To determine stressed syllable in an utterance • Max likelihood of the class “Stressed” • Min likelihood of the class “Unstressed” • Difference of the above two
Features for vowels • Vowel-based features • Pitch: min, mean, max, range, std, slope, etc. • Volume: min, mean, max, range, std, slope, etc. • Duration (normalized by speech rate) • Legendre polynomial fitting for pitch & volume • Spectral emphasized version of the above • …
Lexical Stress Detection – Experiment 1 10-fold Cross Validation Classifier: SVM Feature Set E :Root Mean Square Energy D : Duration P : Pitch S :Root Mean Square Spectral Emphasis Energy PS: Pitch Slope CE: Legendre Coefficient of Root Mean Square Energy Contour CP: Legendre Coefficient of Pitch Contour CS: Legendre Coefficient of Spectral Emphasis Energy Contour
Lexical Stress Detection – Experiment 2 10-fold Cross Validation Classifier: SVM Syllable Number-Independent Classifier vs. Syllable Number-dependent Classifier
Lexical Stress Detection – Experiment 3 10-fold Cross Validation GMMC: Gaussian Mixture Model Classifier NBC: Naïve Bayes Classifier QC: Quadratic Classifier SVMC: Support Vector Machine Classifier
Lexical Stress Detection – Error Analysis • Error Types: • Wrong ground truth / More than 1 pronunciations of the word • conduct2[kənˋdʌkt] / [ˋkɑndʌkt] • Complex Word with 2 primary stressed syllables • worldwide2[`wɝld`waɪd] • histochemistry5[ˋhɪstəˋkɛmɪstrɪ] • Word with Primary stressed and Secondary stressed syllable • deposition4[͵dɛpəˋzɪʃən] • cafeteria5[͵kæfəˋtɪrɪə]
Lexical Stress Detection – Error Analysis • Error Types: • Wrong result from Pitch Tracking • elegant3[ˋɛləgənt] • Wrong result from Forced Alignment • peremptory4[pəˋrɛmptərɪ]
More on Stress Detection • ASRA • Chapter 20 of online tutorial on Audio Signal Processing • Demo • Recognition • goDemoVc.m in ASR • Web • Assessment • goDemoSa.m in ASR • Web • Stress detection • Application note • Demo