390 likes | 604 Views
Performance Improvement of GMM Computation in Sphinx 3.6. Arthur Chan Carnegie Mellon University Mar 10, 2005. This seminar. Not very refined. Some info is missing. ~30 slides. Outline: Overview of GMM Computation in Sphinx 3.X (x<5) (<- This part is not new.)
E N D
Performance Improvement of GMM Computation in Sphinx 3.6 Arthur Chan Carnegie Mellon University Mar 10, 2005
This seminar • Not very refined. Some info is missing. • ~30 slides. • Outline: • Overview of GMM Computation in Sphinx 3.X (x<5) (<- This part is not new.) • 3 Improvement with Experimental Results (<- This part is new.) • Discussion
Scores Senone Computation Search (Information For Pruning GMM) Computation at every frame in Sphinx
Computation of GMMs in a Continuous HMM ASR system • Order of Computation: • #Frames x #GMMs x #Gaussian x Feature length • Typical Numbers: • #Frames = 1000 • #GMMs = 5000 • #Gaussians = 8 to 64 • Feature length = 39 • Not practical to fully compute them.
Overview of GMM Computation in Sphinx 3.X (x<=5) • Philosophy • No single technique will give the best accuracy/speed trade-off. • Techniques in the literature can be categorized and combined in a systematic manner. • Four Level Categorization of GMM Computation Techniques • Frame-level (Down-sampling) • GMM-level (CI-based GMM Selection) • Gaussian-level (VQ-based and SVQ-based Gaussian Selection) • Component-level (Sub-vector quantization) • 3.4:75-80% speed gain with ~5-10% rel. degradation.
Fast GMM Computation: Level 1: Frame Selection -Compute GMM in one and other frame only -Improvement : Compute GMM only if current frame is similar to previous frame
Fast GMM Computation: Level 2 : Senone/GMM Selection GMM -Compute GMM only when its base-phones are highly likely -Others backed-off by the base phone scores. -Similar to -Julius: (Akinobu 1999) -Microsoft’s Rich Get Richer (RGR) heuristics
Fast GMM Computation:Level 3 : Gaussian Selection Gaussian GMM
Fast GMM Computation: Level 4 : LDA Gaussian Feature Component
Frame-level and GMM-level Techniques in S3.X (X<=5) • Frame-level: • Skipping Frames: • Only compute GMMs for 1 out of N frames • Copied the most recently computed frames. • GMM-level: • Use CI GMM as an approximate score • If a CD GMM has good CI GMM scores (within a beam) • Compute the full CD score • If not • Back off to the CI score. • Good CI GMM scores is defined as • Within the beam of the best CI GMM score.
Weaknesses of the Frame-level and GMM-level Techniques • Frame-level • Deteriorate performance significantly (>10%) • Hard to tune. • GMM-level • The number of GMMs computed varied from frame to frame. • ->Worst case performance is poor • CI score is used to back off • ->Search performance degrades because a lot of scores are the same.
Baseline experiments • Tested on 3 tasks • Tested in a tough condition • Manually tuned • Tune on test set (Sorry, couldn’t get the dev. set.) • Optimized one dimension at a time. • Very close to optimal • Goal • faster. • graceful degradation (<5%)
Proposed Methods (A glance) • The goals of the 3 methods • Method 1: Try to reduce the variance of GMM Computation time. • Method 2: Try to make CI-GMMS more well-behaved • Method 2 and a half: Try to make Down-sampling to more well-behaved. • Didn’t work. We will try to analyse why. • Method 3: An idea inspired by the analysis.
Method 1: Use a fixed upper bound for GMMs computed in each frame • Only compute the CD scores if • Corresponding CI is within CI beam AND • The number of CD GMMs computed would not exceed a certain number. • Advantages: • Per utt. GMM computation can be more predictable. • Get a better bargain in trading off computation.
Method 2 : Use the best Gaussian index from the previous frame. • Best Gaussian Index: What does it mean? • Index for the best Gaussian score in a GMM. • Why is it useful? • Two major reasons from literature: • 1, In reality, the best Gaussian score dominates the GMM scores. (up to 95-99%) • 2, Usually, the collision rate of the best Gaussian indices in the current and previous frames is quite high. (Literature say 70%) • (Q: Are these assumptions really correct?)
Method 2 (Algorithm) • In CIGMMS, • for those non-computed senone (was backed off to CI) • If the best index of previous frame is available, assume it is the current best index • Compute GMM • This improves the smoothing performance of CIGMMS • Better accuracy • We can use a tighter beam.
Method 2 and a half (Algorithm) • In Frame-Dropping • When last index is available, assume it is the current best index. • Compute GMM.
Results • Not shown • Because there is no improvement • Why better approximation doesn’t give any gain?
Comparison of Different types of GMM Scores Approximation • GMM scores • Use current best index • not plausible because the whole GMM need to compute first. • Use previous score • but the current frame information is not used. • Use previous best index • If the two assumptions is true, this is a good method. • Use corresponding CI score • Replace the CD score by CI score. Hurt the best performed senones
Analysis 1 : Log Likelihood distortion if current index use. (Is assumption 1 correct?)
Analysis 2 : Is the collision rate always 70%? • On average, YES • For the top senones in noisy task, NO • In the ICSI task, the hit rate for the top 50 senones sometimes will drop to 50%
Analysis 3: Relative magnitude of distortioncaused by different approximations • If Distortion by using current index is 1 • In Frame dropping, (significant Degradation): • Distortion by using previous index is • Comm. : 20 (in 2 mix) , 40 (in 32 mix) • ICSI. : 10 (in 2 mix), 20 (in 32 mix) • Distortion by using previous score • Not tested coz I don’t have time. • Ad-hoc observation : < using previous index • but >>better than CI score. • In CI-GMM Selection, not much degradation • But • Distortion by using the CI score is 100 times than using previous index • ~200-1000
Some thoughts • Why Frame dropping doesn’t work if distortion is not low? • Why CI GMM Selection work if distortion is so high? • My Answer: • It doesn’t matter which approximation was used • What it matters is whether the best scores are computed. • CI GMMS still keep the best GMM scores. • Frame dropping always throwing away the N best GMM scores.
Method 3 • Motivations: • At every frame: best senone scores still need to be computed even in frames need to be ignored. • Concerns: • But how to preserve the effectiveness of down-sampling?
Method 3 • Another very simple idea. • Trick: Use CIGMMS for **every** frame. • But for alternative frame, or frames we want to “ignore”, • Multiply a factor F (0<= F <=1) to the CI-GMMS beam.
Idea 3 (Discussion) • Advantage of the scheme • Best senone scores are still computed when F> 0 • More tunable • Tightening factor is a real number • Preserve the properties of CI-GMMs and Down-sampling. • When F=0, Equivalent to down-sampling • When F=1, Equivalent to CI-based GMM Selection • A smoothing between Frame-level and Gaussian-level. • Idea is dynamic beam
Conclusion • Only 20-25% gain obtained in 3 computation improvements. (90% last time) • Pruned and non-pruned conditions are different scenarios • The performance gain of jointly optimizing two levels would give around 5-10% solid gain. • It’s time to leave GMM computation and work some other things.
Side note: Snapshots of Recent Development of Sphinx 3.6 • The use of per frame CI GMM score is still not optimal • Jim, “Why don’t you use lexical retrieval? It’s very easy to implement.” • Still no improvement in search • Alex, “Seriously…… When can you implement a search using lexical tree copies?” • ICSI/CALO Meeting task give us a lot of fun/pain. • Sphinx 3: 20-30% improvement doesn’t always show up. • “Arthur, do you want to say something?” • Some S3 and ST’s functions look really funny/awful. • Yitao, “*Sigh*”. • Dave, Evandro, (Shake their heads)
Acknowledgement • Thanks • Ravi • Alex • Evandro • Dave