1 / 27

Gaussian Mixture Models and Acoustic Modeling

Gaussian Mixture Models and Acoustic Modeling. Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg. Acoustic Modeling. The goal of the Acoustic Model is to hypothesize a phone label based on acoustic observations .

kaspar
Download Presentation

Gaussian Mixture Models and Acoustic Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gaussian Mixture Models and Acoustic Modeling Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg

  2. Acoustic Modeling • The goal of the Acoustic Model is to hypothesize a phone label based on acoustic observations. • The phone label will be defined by the phone inventory (e.g., IPA, ARPAbet, etc.) • Acoustic Observations will be MFCCs • There are other options.

  3. Gaussian Mixture Model

  4. Mixture Models • A Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a multinomial distribution, π

  5. Gaussian Mixture Model • GMM: weighted sum of a number of Gaussians where the weights are determined by a multinomial, π

  6. Visualizing the a GMM

  7. Latent Variable representation • The mixture coefficients can be viewed as a latentor unobservedvariable. • Training a GMM involves learning both the parameters for the individual Gaussian Models and the Mixture coefficients. • For a fixed set of data points x, the optimal setting of the GMM parameters may not have a single optimum.

  8. Maximum Likelihood Optimization • Likelihood Function • Log likelihood • A log transform makes the optimization much simpler.

  9. Optimizing GMM parameters • Identifying the optimal parameters involves setting partial derivatives of the likelihood function to zero.

  10. Optimizing GMM parameters • Covariance Optimization

  11. Optimizing GMM parameters • Mixture Term

  12. Maximum Likelihood Estimate

  13. What’s the problem? • Circularity: The responsibilities are assigned by the GMM parameters, and are used in identifying their optimal settings • The Maximum Likelihood Function of the GMM does not have a closed for optimization for all three variables. • Expectation Maximization: • Keep one variable fixed, optimize the other. • Here, • fix the responsibility terms, optimize the GMM parameters • then fix the GMM parameters, and optimize the responsibilities

  14. Expectation Maximization for GMMs • Initialize the parameters • Evaluate the log likelihood • Expectation-step: Evaluate the responsibilities • Maximization-step: Re-estimate Parameters • Evaluate the log likelihood • Check for convergence

  15. E-M for Gaussian Mixture Models • Initialize the parameters • Evaluate the log likelihood • Expectation-step: Evaluate the responsibilities • Maximization-step: Re-estimate Parameters • Evaluate the log likelihood • Check for convergence

  16. EM for GMMs • E-step: Evaluate the Responsibilities

  17. EM for GMMs • M-Step: Re-estimate Parameters

  18. Visual example of EM

  19. Potential Problems • Incorrect number of Mixture Components • Singularities

  20. Incorrect Number of Gaussians

  21. Incorrect Number of Gaussians

  22. Singularities • A minority of the data can have a disproportionate effect on the model likelihood. • For example…

  23. GMM example

  24. Singularities • When a mixture component collapses on a given point, the mean becomes the point, and the variance goes to zero. • Consider the likelihood function as the covariance goes to zero. • The likelihood approaches infinity.

  25. Training acoustic models • TIMIT • close, manual phonetic transcription • 2342 sentences • Extract MFCC vectors from each frame within each phone • For each phone, train a GMM using Expectation Maximization. • These GMM is the Acoustic Model. • Common to use 8, or 16 Gaussian Mixture Components.

  26. Sequential Models • Make a prediction every frame. • How often can phones change? • Encourage continuity in predictions. • Model phone transitions.

  27. Next Class • Hidden Markov Models • Reading: J&M 5.5, 9.2

More Related