1 / 23

A Study on Speaker Adaptation of Continuous Density HMM Parameters

This study explores adaptive estimation and Bayesian adaptation of CDHMM parameters for speaker adaptation in continuous density hidden Markov models.

gettinger
Download Presentation

A Study on Speaker Adaptation of Continuous Density HMM Parameters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE

  2. Outlines • Introduction • Adaptive estimation of CDHMM parameters • Bayesian adaptation of Gaussian parameters • Experimental setup and recognition results • Summary

  3. Introduction • Adaptive learning • Adapting reference speech patterns or models to handle the situations unseen in the training phase • For example: • Varying channel characteristics • Changing environmental noise • Varying transducers

  4. Introduction – MAP • Maximum a posteriori (MAP) • Also called Bayesian adaptation • Under the given prior distribution, MAP tries to maximize the posterior distribution

  5. Adaptive estimation of CDHMM parameters • Sequence of SD observation Y = {y1, y2, …,yT} • λ = parameter set of the distribution function • Given a training data Y, we want to estimate λ • If λ is assumed random with a prior distribution function P0(λ), the MAP estimate for λis obtained by solving (MLE) Likelihood function Prior Distribution Language Model Posterior Distribution

  6. Adaptive segmental K-Means algorithm • Maximization of the state-likelihood of the observation sequences in an iterative manner using the segmental k-means training algorithm • s = state sequence • For a given model , find the optimal state sequence • Based on a state sequence , find the MAP estimate

  7. The choices of prior distributions • Non-informative prior • Parameters are fixed but unknown and are to be estimated from the data • No preference to what the value of the parameters should be • MAP = MLE • Informative prior • Knowledge about the parameters to be estimated is known • Choice of prior distribution depends on the acoustic models used to characterize the data

  8. Conjugate prior • Prior and posterior probabilities belong to the same distribution family • Analytical forms of some conjugate priors are available

  9. Bayesian adaptation of Gaussian parameters • 3 implementations of Bayesian adaptation • Gaussian mean • Gaussian variance • Gaussian mean and precision • μ= mean and σ2= variance of one component of a state observation distribution • Precision

  10. Bayesian adaptation of the Gaussian mean • Observation • μ is random • σ2 is fixed and known • MAP estimate for the parameter μ is: • where

  11. Bayesian adaptation of the Gaussian mean (cont.) • MAP converges to MLE when • A large number of samples are used for training. • Relatively large value for prior variance τ2 is chosen (τ2 >> σ2 / n). (non-informative prior)

  12. Bayesian adaptation of the Gaussian variance • Mean μ is estimated from sample mean • Variance σ2is given by an informative prior: • σmin2 is estimated from a large collection of speech data

  13. Bayesian adaptation of the Gaussian variance (cont.) • Variance parameter is: • Sy2 is the sample variance • Effective when insufficient amount of sample data is available

  14. Bayesian adaptation of both Gaussian mean and precision • Both mean and precision parameters are random • The joint conjugate prior P0(μ,θ) is a normal-gamma distribution Normal Distribution Gamma Distribution

  15. Bayesian adaptation of both Gaussian mean and precision (cont.) • MAP estimate of μ and σ2can be derived as: • Prior parameters can be estimated as follows:

  16. Experimental setup • SD data • 5 training utterances per word for each male speaker and 7 utterances for each female speaker • SD testing data • 10 utterances per word per speaker • Recorded over local dialed-up telephone lines • Sampling rate = 6.67kHz • 39 words vocabulary • 26 English letters • 10 digits • 3 command words (stop, error, repeat) • 2 sets of speech data • SI data for SI model, 100 speakers (50F50M) • SD data for adaptation, 4 speakers (2F2M)

  17. Experimental setup (cont.) • Models are obtained by using the segmental k-means training procedure • Maximum number of mixture component per state = 9 • Diagonal covariance matrix • 5-state HMM • 2 sets of SI models • 1st set: as described above • 2nd set: single Gaussian distribution per state

  18. Experimental results 1 • Baseline recognition rate: • SD: 2 Gaussian mixtures per state per word

  19. Experimental results 2 • 5 adaptation experiments: • EXP1: SD mean and an SD variance (regular MLE) • EXP2: SD mean and a fixed variance estimate • EXP3: SA mean (3.1) with prior parameters (3.2-3.3) • EXP4: SD mean and an SA variance (3.5) • EXP5: SA estimates (3.7) with prior parameters (3.8-3.9)

  20. Experimental results 3

  21. Experimental results 4 SD mean, SA variance (method 2) SA mean and precision (method 3) SD mean, SD variance (MLE) SD mean, fixed variance SA mean (method 1)

  22. Experimental results 5

  23. Conclusions • Average recognition rate with all token incorporated = 96.1% • Performance improves when • More adaptation data are used • Both mean and precision are adapted

More Related