1 / 1

Arthur Chan, Ravishankar Mosur Alexander Rudnicky

On Improvement of CI-based GMM Selection in Sphinx 3. Arthur Chan, Ravishankar Mosur Alexander Rudnicky. Computer Science Department Carnegie Mellon University. CMU Sphinx is an open source speech recognition system.

ellie
Download Presentation

Arthur Chan, Ravishankar Mosur Alexander Rudnicky

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Improvement of CI-based GMM Selection in Sphinx 3 Arthur Chan, Ravishankar Mosur Alexander Rudnicky Computer Science Department Carnegie Mellon University • CMU Sphinx is an open source speech recognition system. • Recent development (Sphinx 3.6) has focused on building a real-time continuous HMM system and speaker adaptation. • In this work, we describe improvements to the GMM computation which reduces 10%-30% computation in the Viterbi search in different tasks. • The algorithms are freely available in a www.cmusphinx.org. Context-Independent Senone- Based GMM Selection (CIGMMS) Three Enhancements 1. Bound the number of CD GMMs to be computed. 2. When best Gaussian index (BGI) of previous frame is available and CD is out of beam -> compute CD GMM score based on previous BGI. Motivation: The current BGI is a good approximation of GMM score. And the previous BGI is a good approximation to the current BGI. 3. Use a tightened CI beam size for every N frames. Motivation: Similar to dropping senone computation every N frames, and using previous frame scores (Chan 2004), which significantly reduced computation, but impacted accuracy. Narrowing the CI beam size every N frames preserves the very best scoring senones in the current frame, and improves accuracy. Using a tightening factor provides more flexible control. Summary: Technique for Gaussian Computation Speed-Up (Lee 2001, Chan 2004) Idea: CI senone score as approximate score Procedure: 1. Compute all CI scores, form a beam (CI beam) from the highest score 2. For all CD scores a. If base CI score is within the beam -> Compute detailed CD score b. Else -> Backoff to CI score Issues of the Basic CIGMMS Issue 1: Unpredictable Per-frame Performance:beam search -> number of CD scores computed varies a great deal Issue 2: Poor Pruning Characteristics:Large number of CD scores fallback to the same CI scores -> pruning is less effective Experimental Results Assumptions in Enhancement 2: BGIs in adjacent frames are usually the same. But how often? (Depends on GMM size) Table: Percentages of adjacent BGIs that are the same. Conclusions: Adjacent BGIs are quite consistent (even in noisy tasks). But, less consistent for the top-scoring senones (Not shown in table; Leads to Enhancement 3.) Table: Word error rates and execution times. Summary: Cumulative speedup of up to 37% with only slight increase in WER.

More Related