A Study on Speaker Adaptation of Continuous Density HMM Parameters

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE

Outlines • Introduction • Adaptive estimation of CDHMM parameters • Bayesian adaptation of Gaussian parameters • Experimental setup and recognition results • Summary

Introduction • Adaptive learning • Adapting reference speech patterns or models to handle the situations unseen in the training phase • For example: • Varying channel characteristics • Changing environmental noise • Varying transducers

Introduction – MAP • Maximum a posteriori (MAP) • Also called Bayesian adaptation • Under the given prior distribution, MAP tries to maximize the posterior distribution

Adaptive estimation of CDHMM parameters • Sequence of SD observation Y = {y1, y2, …,yT} • λ = parameter set of the distribution function • Given a training data Y, we want to estimate λ • If λ is assumed random with a prior distribution function P0(λ), the MAP estimate for λis obtained by solving (MLE) Likelihood function Prior Distribution Language Model Posterior Distribution

Adaptive segmental K-Means algorithm • Maximization of the state-likelihood of the observation sequences in an iterative manner using the segmental k-means training algorithm • s = state sequence • For a given model , find the optimal state sequence • Based on a state sequence , find the MAP estimate

The choices of prior distributions • Non-informative prior • Parameters are fixed but unknown and are to be estimated from the data • No preference to what the value of the parameters should be • MAP = MLE • Informative prior • Knowledge about the parameters to be estimated is known • Choice of prior distribution depends on the acoustic models used to characterize the data

Conjugate prior • Prior and posterior probabilities belong to the same distribution family • Analytical forms of some conjugate priors are available

Bayesian adaptation of Gaussian parameters • 3 implementations of Bayesian adaptation • Gaussian mean • Gaussian variance • Gaussian mean and precision • μ= mean and σ2= variance of one component of a state observation distribution • Precision

Bayesian adaptation of the Gaussian mean • Observation • μ is random • σ2 is fixed and known • MAP estimate for the parameter μ is: • where

Bayesian adaptation of the Gaussian mean (cont.) • MAP converges to MLE when • A large number of samples are used for training. • Relatively large value for prior variance τ2 is chosen (τ2 >> σ2 / n). (non-informative prior)

Bayesian adaptation of the Gaussian variance • Mean μ is estimated from sample mean • Variance σ2is given by an informative prior: • σmin2 is estimated from a large collection of speech data

Bayesian adaptation of the Gaussian variance (cont.) • Variance parameter is: • Sy2 is the sample variance • Effective when insufficient amount of sample data is available

Bayesian adaptation of both Gaussian mean and precision • Both mean and precision parameters are random • The joint conjugate prior P0(μ,θ) is a normal-gamma distribution Normal Distribution Gamma Distribution

Bayesian adaptation of both Gaussian mean and precision (cont.) • MAP estimate of μ and σ2can be derived as: • Prior parameters can be estimated as follows:

Experimental setup • SD data • 5 training utterances per word for each male speaker and 7 utterances for each female speaker • SD testing data • 10 utterances per word per speaker • Recorded over local dialed-up telephone lines • Sampling rate = 6.67kHz • 39 words vocabulary • 26 English letters • 10 digits • 3 command words (stop, error, repeat) • 2 sets of speech data • SI data for SI model, 100 speakers (50F50M) • SD data for adaptation, 4 speakers (2F2M)

Experimental setup (cont.) • Models are obtained by using the segmental k-means training procedure • Maximum number of mixture component per state = 9 • Diagonal covariance matrix • 5-state HMM • 2 sets of SI models • 1st set: as described above • 2nd set: single Gaussian distribution per state

Experimental results 1 • Baseline recognition rate: • SD: 2 Gaussian mixtures per state per word

Experimental results 2 • 5 adaptation experiments: • EXP1: SD mean and an SD variance (regular MLE) • EXP2: SD mean and a fixed variance estimate • EXP3: SA mean (3.1) with prior parameters (3.2-3.3) • EXP4: SD mean and an SA variance (3.5) • EXP5: SA estimates (3.7) with prior parameters (3.8-3.9)

Experimental results 3

Experimental results 4 SD mean, SA variance (method 2) SA mean and precision (method 3) SD mean, SD variance (MLE) SD mean, fixed variance SA mean (method 1)

Experimental results 5

Conclusions • Average recognition rate with all token incorporated = 96.1% • Performance improves when • More adaptation data are used • Both mean and precision are adapted

A Study on Speaker Adaptation of Continuous Density HMM Parameters

A Study on Speaker Adaptation of Continuous Density HMM Parameters

Presentation Transcript

HMM An Initial Study on HMM-based TTS for Mandarin Chinese

Applications of Dirichlet Process Mixtures to Speaker Adaptation

Multivariate Visualization of Continuous Datasets, a User Study

Learning HMM parameters

Hmm…

Study of the adaptation of

Study on Speaker Recognition Based on HHT

Overview of HMM

HMM – HMM Comparison

Improved Speaker Adaptation Using Speaker Dependent Feature Projections

Inter Class MLLR for Speaker Adaptation

Adaptation of TDMA Parameters Based on Network Conditions

Bayesian Adaptation in HMM Training and Decoding Using a Mixture of Feature Transforms

Simulation Study on the Effect of the trTCM Parameters

Communicating with oncology patients: a continuous process of adaptation

Adaptation of TDMA Parameters Based on Network Conditions

A study of multiscale density fluctuations

A density curve is the graph of a continuous probability distribution .

Learning HMM parameters

Adaptation of TDMA Parameters Based on Network Conditions

Adaptation of TDMA Parameters Based on Network Conditions

Structural Maximum A Posteriori Linear Regression for Fast HMM Adaptation