Gaussian Mixture Models and Acoustic Modeling

Gaussian Mixture Models and Acoustic Modeling Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg

Acoustic Modeling • The goal of the Acoustic Model is to hypothesize a phone label based on acoustic observations. • The phone label will be defined by the phone inventory (e.g., IPA, ARPAbet, etc.) • Acoustic Observations will be MFCCs • There are other options.

Gaussian Mixture Model

Mixture Models • A Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a multinomial distribution, π

Gaussian Mixture Model • GMM: weighted sum of a number of Gaussians where the weights are determined by a multinomial, π

Visualizing the a GMM

Latent Variable representation • The mixture coefficients can be viewed as a latentor unobservedvariable. • Training a GMM involves learning both the parameters for the individual Gaussian Models and the Mixture coefficients. • For a fixed set of data points x, the optimal setting of the GMM parameters may not have a single optimum.

Maximum Likelihood Optimization • Likelihood Function • Log likelihood • A log transform makes the optimization much simpler.

Optimizing GMM parameters • Identifying the optimal parameters involves setting partial derivatives of the likelihood function to zero.

Optimizing GMM parameters • Covariance Optimization

Optimizing GMM parameters • Mixture Term

Maximum Likelihood Estimate

What’s the problem? • Circularity: The responsibilities are assigned by the GMM parameters, and are used in identifying their optimal settings • The Maximum Likelihood Function of the GMM does not have a closed for optimization for all three variables. • Expectation Maximization: • Keep one variable fixed, optimize the other. • Here, • fix the responsibility terms, optimize the GMM parameters • then fix the GMM parameters, and optimize the responsibilities

Expectation Maximization for GMMs • Initialize the parameters • Evaluate the log likelihood • Expectation-step: Evaluate the responsibilities • Maximization-step: Re-estimate Parameters • Evaluate the log likelihood • Check for convergence

E-M for Gaussian Mixture Models • Initialize the parameters • Evaluate the log likelihood • Expectation-step: Evaluate the responsibilities • Maximization-step: Re-estimate Parameters • Evaluate the log likelihood • Check for convergence

EM for GMMs • E-step: Evaluate the Responsibilities

EM for GMMs • M-Step: Re-estimate Parameters

Visual example of EM

Potential Problems • Incorrect number of Mixture Components • Singularities

Incorrect Number of Gaussians

Singularities • A minority of the data can have a disproportionate effect on the model likelihood. • For example…

GMM example

Singularities • When a mixture component collapses on a given point, the mean becomes the point, and the variance goes to zero. • Consider the likelihood function as the covariance goes to zero. • The likelihood approaches infinity.

Training acoustic models • TIMIT • close, manual phonetic transcription • 2342 sentences • Extract MFCC vectors from each frame within each phone • For each phone, train a GMM using Expectation Maximization. • These GMM is the Acoustic Model. • Common to use 8, or 16 Gaussian Mixture Components.

Sequential Models • Make a prediction every frame. • How often can phones change? • Encourage continuity in predictions. • Model phone transitions.

Next Class • Hidden Markov Models • Reading: J&M 5.5, 9.2

Gaussian Mixture Models and Acoustic Modeling