I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu

I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAININGHaozheng Li, CosminMunteanu Pei-ningChen Department of Computer Science & Information Engineering National Taiwan Normal University

Outline • Introduction • MLE &MCE • I-smooth Method on MCE • Experimental results • Conclusions

Introduction • The most common DT methods are minimum classification error (MCE) and maximum mutual information (MMI) (or its variant minimum phone error (MPE) ) training. Both these DT methods focus on reducing the empirical error in the training set. • One of the more important drawbacks of DT – the overfitting problem – seems to be addressed only through solutions that are either a departure from the DT framework, or have limited practical applicability.

Introduction(cont.) • Interpolation between MLE and the discriminative objective functions has been applied in MMIE and MPE in a manner that depends on the amount of data available for each Gaussian or has been directly applied as hard interpolation. • They use I-smoothing to interpolate between two MCEs with different steepness parameters, as opposed to interpolating between MLE and MCE.

MLE • The Baum-Welch algorithm uses it to do soft alignment of the training utterance and assigns the aligned speech to the hidden states to adjust the model’s parameters in an iterative procedure.

MCE • MCE aims to minimize the number of smoothed empirical errors. This discriminative method makes use of a competing model’s additional information to train a model. • Misclassification measure:

MCE(cont.) Subscript cor and com indicate that the state this component belongs to locates in correct and competing state sequence respectively. θ(O) indicates that normalized speech O is weighted by a function L.

MCE(cont.) • If the steepness parameter r is very large, the curve of L(d(Y )) is sharp. • It will only select those utterances that locate in the score boundary area in order to tune the model parameters.

MCE(cont.) • Implementation details for the baseline MCE: (1) Average frame misclassification measure d(Y )/T is used instead of d(Y ) to help the tuning process. (2) We use a chopped sigmoid (3) Variable learning rates

I-smooth Method on MCE • I-smoothing means increasing the number of Gaussian occupancies, but keeping invariant the average data values and average squared data values. • They propose a different strategy to deal with the problem of over-training in that the interpolation is applied between the occupancies obtained in MCE with different steepness parameters rather than interpolation with MLE.

I-smooth Method on MCE(cont.) • They use misclassification measure to define another data selection criteria: • The selected data are close to the boundary area and may cause some confusion during training; as such, the selected speech data is referred to as confusion data.

I-smooth Method on MCE(cont.) (1) In theory, the boundary data define a more accurate score boundary, but because of the lack of availability, it may not represent the true score boundary data distribution. (2) On the other side, Conf_OCCis much larger than Bound_OCC, however, the confusion data define a less accurate score boundary.

I-smooth Method on MCE(cont.) (1) If a component has Bound_OCClarger than a certain threshold, we assume its assigned boundary data could reflect the true boundary data distribution and does not need to be interpolated. (2) Assume Conf_OCCis less affected by some random factor and the ratio between Bound_OCCand Conf_OCC should locate in a certain range for each Gaussian component

I-smooth Method on MCE(cont.)

Experimental results

Conclusions • In this paper they propose a method for applying interpolation between the boundary data and the confusion data, aimed to alleviate the overfittingproblem in MCE. • They interpolate the boundary data with the confusion data when the number of the boundary occupancy is small and when the number ratio between them is small.

I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu

I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu

Presentation Transcript

Minimum Training/Orientation Requirements

Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant

Machine Translation Minimum Error Rate Training

Minimum Classification Error (MCE) Approach in Pattern Recognition

Classification I

Training for Improved Performance

Improved Bounds for Minimum Fault-Tolerant Gossip Graphs

Classification I

Biofeedback System for Improved Athletic Training

Using Error-Correcting Codes For Text Classification

Using Error-Correcting Codes For Text Classification

Feasibility of Human-in-the-loop Minimum Error Rate Training

Minimum Error Rate Training in Statistical Machine Translation

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION

Minimum Error Rate Training in Statistical Machine Translation

Minimum Phone Error (MPE) Model and Feature Training

Corobea Cosmin

Minimum Rank Error Language Modeling

Minimum Rank Error Training for Language Modeling

Finding Minimum Type Error Sources

Minimum Standards for EWP Training