1 / 11

TANDEM ACOUSTIC MODELING IN LARGE-VOCABULARY RECOGNITION

TANDEM ACOUSTIC MODELING IN LARGE-VOCABULARY RECOGNITION. Daniel P.W. Ellis1, Rita Singh2, and Sunil Sivadas3 2001 ICASSP 2012/10/22 汪逸婷 報告. Outline. Introduction The SPINE1 tandem system Experimental results Discussion Conclusions. 1. Introduction. NN

heidi
Download Presentation

TANDEM ACOUSTIC MODELING IN LARGE-VOCABULARY RECOGNITION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TANDEM ACOUSTIC MODELING IN LARGE-VOCABULARY RECOGNITION Daniel P.W. Ellis1, Rita Singh2, and Sunil Sivadas3 2001 ICASSP 2012/10/22 汪逸婷 報告

  2. Outline • Introduction • The SPINE1 tandem system • Experimental results • Discussion • Conclusions

  3. 1. Introduction • NN • When used to estimate the posterior probabilities of a closed set of subword units, they allow discriminative training in a natural and efficient manner. • They also make few assumptions about the statistics of input features. • They have been found well able to cope with highly correlated and unevenly distributed features(like spectral energy features).

  4. 1. Introduction • GMM • Often used to build independent distribution models for each subword. • Work best when supplied with low-dimensional, decorrelated input features. • In small task:NN-HMM > GMM-HMM. • In large-vocabulary( DARPA / NIST):NN-HMM << GMM-HMM. • Equivalent adaptation is much more difficult for NN-based systems.

  5. 1. Introduction • Tandem system for the NRL SPINE1 task. • Question: • whether GMM systems can outperform NNs in large task? • Would the use of NN feature preprocessor continue to confer an advantage in larger tasks involving more contextual variability? • Would model adaptation schemes such as MLLR be effective in the new feature space defined by the network outputs?

  6. 1. Introduction • Corpus: • NRL SPINE1. • 5000 words. • Utterances are predominantly noisy. • Signal-to-noise ratios ranging from 5dB~20dB. • Data consists of human-human dialogs in a battleship game. • WERs in noisy digits task were at 1%. • Below for the best cases, the very best systems in the SPINE1 about 25%.

  7. 2. The SPINE1 tandem system

  8. 2.1 System training

  9. 3. Experimental results

  10. 4. Discussion • The large improvement is mainly eliminated for the context-dependent models. • MLLR results in a greater improvement for the tandem features than for MFC and PLP features(comparable for context-dependent). • The advantages in modeling of CD classes gained by the net’s feature space remapping appear to be largely nullified.

  11. 5. Conclusions • Show that tandem using a combination of NN, trained to estimate posterior probabilities of CI can achieve significant and reductions WER. • Further work needs to be done to extend the benefits to CD models.

More Related