360 likes | 508 Views
Learning Structured Models for Phone Recognition. Slav Petrov, Adam Pauls, Dan Klein. Acoustic Modeling. Motivation. Standard acoustic models impose many structural constraints We propose an automatic approach Use TIMIT Dataset MFCC features Full covariance Gaussians.
E N D
Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein
Motivation • Standard acoustic models impose many structural constraints • We propose an automatic approach • Use TIMIT Dataset • MFCC features • Full covariance Gaussians (Young and Woodland, 1994)
? ? ? ? ? ? ? ? ? ? Phone Classification
HMMs for Phone Classification Temporal Structure
Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures
Our Model Standard Model Fully Connected Single Gaussians
25.6% 23.9% Hierarchical Baum-Welch Training 32.1% 28.7%
? ? ? ? ? ? ? ? ? Phone Recognition
t-1 t t+1 t-1 t t+1 Merging • Not all phones are equally complex • Compute log likelihood loss from merging Split model Merged at one node
t-1 t t+1 t-1 t t+1 Merging Criterion
Alignment Results
Inference • State sequence: d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5 • Phone sequence: d- d- d-d-ae- ae- ae- ae- d- d-d- d- d • Transcription d - ae - d Viterbi Variational ???
Solution: : Posterior edge marginals Variational Inference Variational Approximation:
Conclusions • Minimalist, Automatic Approach • Unconstrained • Accurate • Phone Classification • Competitive with state-of-the-art discriminative methods despite being generative • Phone Recognition • Better than standard state-tied triphone models
Thank you! http://nlp.cs.berkeley.edu