Minimum Phone Error (MPE) Model and Feature Training

Minimum Phone Error (MPE)Model and Feature Training ShihHsiang 2006

The derivation flow of the various training criteria

Difference • MPE v.s. ORCE • ORCE focuses on word error rate and is implemented on N-best results • MPE focuses on phone accuracy and is implemented on a word graph also introduces the prior distribution of the new estimated models (I-smoothing) • MPE v.s. MMI • MMI treated the correct transcriptions as the numerator lattice and the whole word graph as the denominator lattice or the competing sequences • MPE treats all possible correct sequences on the word graph as the numerator lattice, and treats all possible wrong sequences as the denominator lattice

fMPE (cont.) • Feature-space minimum phone error (fMPE) is a discriminative training method which adds an offset to the old feature transform matrix high-dimensional feature current feature current frame average Each vector contains 10,000 Gaussian posterior probability And the Gaussian likelihoods are evaluated with no priors

fMPE (cont.) • Objective Function using gradient descent to update the transformation matrix Direct differential

fMPE (cont.) • When using only direct differential to update the transformation matrix, significant improvements are obtainable but then lost very soon when the acoustic model is retrained with ML • The indirect differential part thus aims to reflect the model change from the ML training with new features,

offset fMPE • The difference of offset fMPE from the original fMPE is the definition of the high dimensional vector t h of posterior probabilities where represents the posterior of i -th Gaussian at time tsize: • The number of Gaussians needed is about 1000, which is significantly lower than 100000 for the original fMPE dimension dependent

Dimension-weighted offset fMPE • Different from the offset fMPE which gives the same weight on each dimension of the feature offset vector • calculates the posterior probability on each dimension of the feature offset vector

Experiments (on MATBN) • Error rates (%) for MPE and fMPE for different features, on different acoustic levels.

Experiments (cont.) • CER(%) for offset fMPE and dimension-weighted offset fMPE with different features

+ = Connect to SPLICE • Decomposition Scheme 1

Connect to SPLICE (cont.) • Compensation of the original feature is carried out by adding a large number of bias vectors, each of which is computed as a full-rank rotation of a small set of posterior probabilities • Maximum-Likelihood estimation denotes the term greater than remaining (n-1) terms

Connect to SPLICE (cont.) • Decomposition Scheme 2 + =

Connect to SPLICE (cont.) • The compensation vector consists of a linear weighted sum of a set of frame-independent correction vectors, where the weight is the posterior probability associated with the corresponding correction vector • The key difference is • the bias vector for compensation in fMPE is specific to each time frame t • the bias vector in feature-space stochastic matching is common over all frames in the utterance

Minimum Phone Error (MPE) Model and Feature Training