140 likes | 300 Views
Minimum Phone Error (MPE) Model and Feature Training. ShihHsiang 2006. The derivation flow of the various training criteria. Difference. MPE v.s. ORCE ORCE focuses on word error rate and is implemented on N-best results
E N D
Minimum Phone Error (MPE)Model and Feature Training ShihHsiang 2006
Difference • MPE v.s. ORCE • ORCE focuses on word error rate and is implemented on N-best results • MPE focuses on phone accuracy and is implemented on a word graph also introduces the prior distribution of the new estimated models (I-smoothing) • MPE v.s. MMI • MMI treated the correct transcriptions as the numerator lattice and the whole word graph as the denominator lattice or the competing sequences • MPE treats all possible correct sequences on the word graph as the numerator lattice, and treats all possible wrong sequences as the denominator lattice
fMPE (cont.) • Feature-space minimum phone error (fMPE) is a discriminative training method which adds an offset to the old feature transform matrix high-dimensional feature current feature current frame average Each vector contains 10,000 Gaussian posterior probability And the Gaussian likelihoods are evaluated with no priors
fMPE (cont.) • Objective Function using gradient descent to update the transformation matrix Direct differential
fMPE (cont.) • When using only direct differential to update the transformation matrix, significant improvements are obtainable but then lost very soon when the acoustic model is retrained with ML • The indirect differential part thus aims to reflect the model change from the ML training with new features,
offset fMPE • The difference of offset fMPE from the original fMPE is the definition of the high dimensional vector t h of posterior probabilities where represents the posterior of i -th Gaussian at time tsize: • The number of Gaussians needed is about 1000, which is significantly lower than 100000 for the original fMPE dimension dependent
Dimension-weighted offset fMPE • Different from the offset fMPE which gives the same weight on each dimension of the feature offset vector • calculates the posterior probability on each dimension of the feature offset vector
Experiments (on MATBN) • Error rates (%) for MPE and fMPE for different features, on different acoustic levels.
Experiments (cont.) • CER(%) for offset fMPE and dimension-weighted offset fMPE with different features
+ = Connect to SPLICE • Decomposition Scheme 1
Connect to SPLICE (cont.) • Compensation of the original feature is carried out by adding a large number of bias vectors, each of which is computed as a full-rank rotation of a small set of posterior probabilities • Maximum-Likelihood estimation denotes the term greater than remaining (n-1) terms
Connect to SPLICE (cont.) • Decomposition Scheme 2 + =
Connect to SPLICE (cont.) • The compensation vector consists of a linear weighted sum of a set of frame-independent correction vectors, where the weight is the posterior probability associated with the corresponding correction vector • The key difference is • the bias vector for compensation in fMPE is specific to each time frame t • the bias vector in feature-space stochastic matching is common over all frames in the utterance