110 likes | 121 Views
This study explores the acoustic properties of incorrectly recognized words in the Switchboard corpus, aiming to reduce recognition errors and enhance accuracy by focusing on stressed syllables and vocalic-initial words.
E N D
Acoustic Signature of Incorrectly Recognized Words (Switchboard) Steven Greenberg July 19, 2004
Incorrectly Recognized Words For the WS04 Acoustic Landmark Detection task, the group will be rescoring recognition lattices to ascertain whether detection of landmarks and other acoustic properties can reduce word error rate in Switchboard The baseline recognition systems correctly recognize ca. 80% of the words Therefore, if a method can be developed to reliably identify words that are likely to be incorrectly recognized, it would be possible to focus the rescoring effort on this subset of Switchboard Can this be done? I believe so, because in the diagnostic evaluation performed for the year 2000 Switchboard evaluation(Greenberg and Chang, 2000) there were certain acoustic parameters that were shown to be highly correlated with word recognition error These data are shown on the following slides
Unstressed Intermediate Stress Fully Stressed Syllable Stress and Word Error Rate An hour’s subset of Switchboard was manually labeled with respect to stress accent by two trained transcribers (high concordance level) The data were used to ascertain if there was a correlation between stress accent level and the word recognition error rate The probability of a deletion error is MUCH higher in unstressed syllables
Syllable Structure Also Correlated with WER A separate analysis demonstrated that WER was also correlated with syllable structure Words beginning with a vowel were far more likely to be incorrectly recognized than words beginning with a consonant Particularly if the word is monosyllabic (the greatest number of instances)
Automatic Labeling of Stress Accent A system for automatic labeling of stress accent has been developed at ICSI (Greenberg et al., 2002) This labeling system (AutoSAL) is as accurate as a trained human transcriber An example of AutoSAL’s output from the Switchboard corpus is shown below
Automatic Labeling of Stress Accent A system for automatic labeling of stress accent has been developed at ICSI (Greenberg et al., 2002) This labeling system (AutoSAL) is as accurate as a trained human transcriber
A Sample of AutoSAL An example of AutoSAL’s output from the Switchboard corpus is shown below for a single speaker
Acoustic Basis of AutoSAL The acoustic parameters associated with AutoSAL’s performance are shown below The most important parameters are: (a) nucleus duration, (b) normalized energy of the nucleus relative to other nuclei over ca. 3 s of context, and (c) the spectral contour associated with nucleus
SVM Implementation of AutoSAL Amit and Vidja will be implementing an SVM version of AutoSAL using the ICSI labels for training They will train the SVMs to distinguish Accented from Unaccented nuclei This SVM version of AutoSAL will then be used to label each syllable nucleus in the Switchboard corpus Those regions of the corpus where there are a high number of unaccented syllables will form the focus of the potential rescoring effort
Vocalic-Initial Words Amit already has reliable vowel detectors as part of his landmark system He will also be developing SVMs to detect the individual constituents of syllables as part of the pronunciation modeling effort Syllables beginning with a vowel (i.e.,, lacking a consonantal onset) are likely to be incorrectly recognized by conventional Switchboard recognition systems Hence, these words will also be flagged for potential rescoring
That’s All Many Thanks for Your Time and Attention