1 / 11

Acoustic Signature of Incorrectly Recognized Words (Switchboard) Steven Greenberg July 19, 2004

Acoustic Signature of Incorrectly Recognized Words (Switchboard) Steven Greenberg July 19, 2004. Incorrectly Recognized Words.

romanoj
Download Presentation

Acoustic Signature of Incorrectly Recognized Words (Switchboard) Steven Greenberg July 19, 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Acoustic Signature of Incorrectly Recognized Words (Switchboard) Steven Greenberg July 19, 2004

  2. Incorrectly Recognized Words For the WS04 Acoustic Landmark Detection task, the group will be rescoring recognition lattices to ascertain whether detection of landmarks and other acoustic properties can reduce word error rate in Switchboard The baseline recognition systems correctly recognize ca. 80% of the words Therefore, if a method can be developed to reliably identify words that are likely to be incorrectly recognized, it would be possible to focus the rescoring effort on this subset of Switchboard Can this be done? I believe so, because in the diagnostic evaluation performed for the year 2000 Switchboard evaluation(Greenberg and Chang, 2000) there were certain acoustic parameters that were shown to be highly correlated with word recognition error These data are shown on the following slides

  3. Unstressed Intermediate Stress Fully Stressed Syllable Stress and Word Error Rate An hour’s subset of Switchboard was manually labeled with respect to stress accent by two trained transcribers (high concordance level) The data were used to ascertain if there was a correlation between stress accent level and the word recognition error rate The probability of a deletion error is MUCH higher in unstressed syllables

  4. Syllable Structure Also Correlated with WER A separate analysis demonstrated that WER was also correlated with syllable structure Words beginning with a vowel were far more likely to be incorrectly recognized than words beginning with a consonant Particularly if the word is monosyllabic (the greatest number of instances)

  5. Automatic Labeling of Stress Accent A system for automatic labeling of stress accent has been developed at ICSI (Greenberg et al., 2002) This labeling system (AutoSAL) is as accurate as a trained human transcriber An example of AutoSAL’s output from the Switchboard corpus is shown below

  6. Automatic Labeling of Stress Accent A system for automatic labeling of stress accent has been developed at ICSI (Greenberg et al., 2002) This labeling system (AutoSAL) is as accurate as a trained human transcriber

  7. A Sample of AutoSAL An example of AutoSAL’s output from the Switchboard corpus is shown below for a single speaker

  8. Acoustic Basis of AutoSAL The acoustic parameters associated with AutoSAL’s performance are shown below The most important parameters are: (a) nucleus duration, (b) normalized energy of the nucleus relative to other nuclei over ca. 3 s of context, and (c) the spectral contour associated with nucleus

  9. SVM Implementation of AutoSAL Amit and Vidja will be implementing an SVM version of AutoSAL using the ICSI labels for training They will train the SVMs to distinguish Accented from Unaccented nuclei This SVM version of AutoSAL will then be used to label each syllable nucleus in the Switchboard corpus Those regions of the corpus where there are a high number of unaccented syllables will form the focus of the potential rescoring effort

  10. Vocalic-Initial Words Amit already has reliable vowel detectors as part of his landmark system He will also be developing SVMs to detect the individual constituents of syllables as part of the pronunciation modeling effort Syllables beginning with a vowel (i.e.,, lacking a consonantal onset) are likely to be incorrectly recognized by conventional Switchboard recognition systems Hence, these words will also be flagged for potential rescoring

  11. That’s All Many Thanks for Your Time and Attention

More Related