1 / 14

Context-Regularized Deep Neural Networks for Multi-Task Learning in ASR

This study presents a methodology for enhancing context-dependent deep neural networks through multi-task training and regularization. The approach improves classification performance of monophone units in ASR tasks, achieving significant performance gains over baseline systems. Future research will explore incorporating this technique into full-sequence training for further improvements.

aldrichr
Download Presentation

Context-Regularized Deep Neural Networks for Multi-Task Learning in ASR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regularization of context-dependent deep neural networks with context-independent multi-task training Peter Bell and Steve Renals 2015/11/10 Ming-Han Yang

  2. Outline • Abstract • Introduction • Context Modelling in DNNs • Methodology • Baseline DNN system • Regularization and multitask learning • Acoustic features • Experiments • ASR task • Results • Conclusions Page

  3. Abstract Page

  4. Introduction Page

  5. Context Modelling in DNNs Page

  6. Methodology (1) – Baseline DNN system Page

  7. Methodology (2) – Regularization &multitask learning Page

  8. Methodology (3) – Acoustic features Page

  9. Experiments (1) – ASR tasks • TED English transcription task in the IWSLT evaluation campaign • Results on the dev2010, tst2010 and tst2011 sets • 8-11 single-speaker talks of approximately 10 minutes’ duration • pre-segmented by the IWSLT organisers • Our in-domain acoustic model training data was derived from 813 publicly available TED talks dating prior to end of 2010, giving 143 hours of speech for acoustic model training • Trigram LM • trained on 312MW from data sources prescribed by the IWSLT evaluation • Corpus : TED talks + Europarl + News Crawl + Gigaword • The vocabulary was limited to 64K words, selected according to unigram counts • Decoding was performed using HTK’s Hdecode • We did not perform any rescoring with a larger LM in this work. Page

  10. Experiments (2) – Results Page

  11. Experiments (3) – Results Page

  12. Experiments (3) – Results Page

  13. Conclusion We have presented a simple, but effective of method of improving the performance of context-dependent hybrid DNN systems through the use of jointly optimising the classification performance of monophone units. This acts as a regulariser, and, we believe, encourages more relevant discrimination in the lower layers of the network. Performance improvements of 3%-10% over the baseline In future, we will investigate the use of this technique with fullsequence training, where the targets used in DNN optimisation are more closely matched to the word error rate. Page

  14. THANK YOU Page

More Related