200 likes | 230 Views
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression. Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇. Proc. ARPA Spoken Language Technology Workshop, 1995. Outline. Introduction MLLR Overview Fixed and Dynamic Regression Classes
E N D
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language Technology Workshop, 1995
Outline • Introduction • MLLR Overview • Fixed and Dynamic Regression Classes • Supervised Adaptation vs. Unsupervised Adaptation • Evaluation on WSJ Data • Conclusion
Introduction • Speaker Independent (SI) Recognition systems • Poor performance • Easy to get lots of training data • Speaker Dependent (SD) Recognition systems • Better performance • Difficult to get enough training data • Solution: SI system + adaptation with little SD data • Advantage: Little SD data is required • Problem: some models are not updated
Introduction (aim of the paper) • MLLR (Maximum Likelihood Linear Regression) approach • Parameter transformation technique • All models are updated with little adaptation data • Adapts the SI system by transforming the mean parameters with a set of linear transforms • Dynamic Regression Classes approach • Optimizing the adaptation procedure during runtime • Allows all models of adaptation to be performed in a single framework
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 MLLR Overview • Regression Classes • The set of Gaussians that shares the same transformation SD Data Mixture components Regression Classes transform Transformation Matrix (W) estimate
MLLR Overview (cont.) SI mean SD mean Therefore, for a single Gaussian distribution, the probability density function of state j generating a speech observation vector o of dimension n is:
Estimation of MLLR matrices Gaussian covariance matrices are diagonal A set of T frames of adaptation data O = o1 o2 … oT Wj is tied between R Gaussians j1j2 … jR Wj can be updated column by column:
Estimation of MLLR matrices (cont.) zi= ith column of Z: The probability of occupying state j at time t while generating O: c(r)ii is the ith diagonal element of the rth tied state covariance scaled by the total state occupation probability
MLLR for Incremental Adaptation • Can be implemented by accumulating the time dependent components separately • Accumulate the observation vectors associated with each Gaussian and the associated occupation probability • MLLR equations can be implemented as any time
Fixed Regression Classes • Regression classes are predetermined by assessing • the amount of adaptation data • Mixture component clustering procedure based on a likelihood measure • Number of regression classes is roughly proportional to the number of adaptation data • Disadvantage: • Needs to know the adaptation data in advance • Some regression classes might not have sufficient amount of data • Poor estimates of the transformations • Class may be dominated by a specific mixture component
Dynamic Regression Classes • Mixture components are arranged into a tree • Leaves of the tree are: • For small HMM system: individual mixture component • For large HMM system: base classes containing a set of mixture components • These components are similar in divergence measure • Leaves in a tree are then merged into groups of similar components based on a distance measure (divergence)
Supervised Adaptation vs. Unsupervised Adaptation Note: Fixed regression class approach was used Figure: Supervised vs. Unsupervised adaptation using RM corpus
Evaluation on WSJ Data • Experiment settings • Dynamic regression classes approach • Baseline Speaker Independent system (refer to 5.1) • S3 test: • Static supervised adaptation for non-native speakers • S4 test: • Incremental unsupervised adaptation for native speakers
Regression Class Tree Settings • Distance measure: • Divergence between mixture components • Use clustering algorithm to generate 750 base classes • 750 mixture components were chosen • Assign the nearest 10 to each base class • Assign the rest to the base classes by using an average distance measure from all the existing members • Regression tree was then built in a similar distance measure • Base classes are compared in pair-wise basis using an average divergence between all members of each class
S4 Test Results Note: Increase update interval: large reduction in adaptation computation and only small drop in performance
Adaptation in Nov’94 Hi-P0 HTK System • Unsupervised adaptation • Adapt for 15 sentences from each speaker from unfiltered newspaper articles • About 15 million parameter in this HMM set • Used 750 base classes
Conclusion • MLLR approach can be used for both static and incremental adaptation • MLLR approach can be used for both supervised and unsupervised adaptation • Dynamic regression classes