200 likes | 312 Views
Segmental GPD training of HMM based speech recognizer. Author: W. Chou, B.H. Juang and C.H. Lee Present: Yi-Ning Huang. Outline . Introduction The system configuration Segmental GPD training of HMMs Parameter transformations Experimental evaluation Summary and discussion. Introduction.
E N D
Segmental GPD training of HMM based speech recognizer Author: W. Chou, B.H. Juang and C.H. Lee Present: Yi-Ning Huang
Outline • Introduction • The system configuration • Segmental GPD training of HMMs • Parameter transformations • Experimental evaluation • Summary and discussion
Introduction • GPD: “generalized probabilistic descent” • In this paper, we propose a segmental based training method, segmental GPD training, for speech recognizer using hidden markov model and Viterbi decoding.
The main features of our approach can be summarized as follows • The algorithm is based on the principle of minimum recognition error rate in which segmentation and discriminative training are jointly optimized. • The algorithm can be initialized from a given HMM, regardless of whether it has been trained according to other criteria or directly generated from a training set with (non-optimal) uniform segmentation.
The algorithm handles both errors and correct recognition cases in a theoretically consistent way, and is adaptively adjusted to achieve an optimal configuration with maximum possible separation between each confusing classes. • The algorithm can be used either off-line or on-line with the ability of learning new features from any new training sources. • The algorithm is consistent with HMM framework and does not required major modification of the current system. Moreover, it is theoretically justified to converge to a (at least locally) minimum point of the recognition error rate.
The system configuration • The observation probability density function of observing vector x in j-th state of i-th word HMMwhere is the mixture weights
The log-likelihood score of the input utterance X along its optimal path in i-th model λ • :the corresponding state sequence along the optimal path • :the corresponding observation vector at time t • T(X):the number of frames in the input utterance X • :the state transition probability from state to state
Define the classification error count function for i-th classthen the goal of training is to reduce the expected error ratetraining results is often measures by the empirical error rate
Segmental GPD training of HMMs • In segmental GPD training, the loss fuction is constructed through the following steps: • Define the misclassification measure for each class iη :positive number, W :total number of classes
Define the smoothed loss function for each class • Define the loss function for entire training population
Generalized probabilistic decent (GPD) algorithm adjusts the model parameters Λ recursively according to
Parameter transformations • In segmental GPD training, the HMM parameters are adaptively adjusted according to p.11 • A diagram of this training procedure is illustrated in Figure 1.
The following transformations are used in our approach • Logarithm of the variance is the variance of the i-th word, j-th state, k-th mixture and d-th dimension. • Transformed logarithm of the mixture wrights L is the total number of mixture weights in the j-th state in i-th word model.
Transformed logarithm of the transition probsbilityM is total number of states in i-th word model.
Experimental evaluation • First experiment • The English E-set (b, c, d, e, g, p, t, v, z) • 50 male and 50 female (all native American) • Through local dialed-up telephone lines • 10-state, 5-mixture: testing set – 76% , training set – 89%10 iteration : testing set 88.3% training set 99.6% • 15-state, 3-mixture:testing set – 73.3% , training set – 86.3% testing set 88.7% training set 100%
Figure 2: Recognition curve of segmental GPD training (88.7% on testing data set)
Second experiment • TI-database of connected digit utterances • Has a random length from 1 to 7 • Recorded from various region of U.S. • 8565 strings for training8578 strings for testing • 10-state, 64-mixture
Summary and discussion • We demonstrated the effectiveness of the proposed training algorithm in isolated word and connected digit recognition applications.