1 / 15

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE. Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain. Contents. Bayesian Networks Automatic Speech Recognition using Dynamic BNs Auxiliary variables

lgerry
Download Presentation

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain

  2. Contents • Bayesian Networks • Automatic Speech Recognition using Dynamic BNs • Auxiliary variables • Experiments with energy as an auxiliary variable • Conclusions

  3. Joint distribution of V: P(V) = P(vn|parents(vn)) N v2 P n=1 v1 v3 What is a Bayesian Network? • A BN is a type of graphical model composed of: • A directed acyclic graph (DAG) • A set of variables V = {v1,… ,vN} • A set of probability density functions P(vn|parents(vn)) Example: P(V) = P(v1,v2,v3) = P(v1|v2)×P(v2)×P(v3|v2)

  4. T P t=1 Automatic Speech Recognition (ASR) M1: ‘cat’ M2: ‘dog’ … MK: ‘tiger’ LPC, MFCC,... HMM, ANN,... Feature extraction Statistical models Mj X = {x1,… ,xT} Mj = argmax P(Mk|X) = argmax P(X|Mk) × P(Mk) {Mk} {Mk} P(X|Mk) = p(xt|qt) × p(qt|qt-1)

  5. ASR with Dynamic Bayesian Networks phone qt /k/ /a/ /a/ /t/ acoustics xt t = 1 t = 2 t = 3 t = 4 Equivalent to a standard HMM

  6. ASR with Dynamic Bayesian Networks P(qt | qt-1 ) qt-1 qt p(xt|qt=k) ~ Nx(mk,Sk) xt-1 xt

  7. Auxiliary information (1) • Main advantage of BNs: • Flexibility in defining dependencies between variables • Energy damage the system performance if it is appended to the feature vector • BNs allow us to use it in an alternativeway: • Conditioning the emission distributions upon this auxiliary variable • Marginalizing it out in recognition

  8. Auxiliary information (2) The value of at affects the value of xt qt at p(xt | qt=k ,at=z) ~ Nx(mk+Bk×z,Sk) xt

  9. Auxiliary information (3) The value of the auxiliary variable can be influenced by the hidden state qt qt at p(at | qt=k) ~ Na(mak ,Sak) xt p(xt | qt=k,at=z) ~ Nx(mk+Bkz,Sk)

  10. Auxiliary information (4) Equivalent to appending the auxiliary variable to the feature vector qt at p(xt , at |qt=k) ~ Nxa( mkxa, Skxa) xt

  11. Hiding auxiliary information • We can also marginalize out (hide) • the auxiliary variable in recognition • Useful when: • It is noisy • It is not accessible qt at ò p(xt|qt) = p(xt|qt,at)×p(at|qt)dat xt

  12. Experimental setup • Isolated word recognition • Small vocabulary (75 words) • Feature extraction: Mel Frequency Cepstral Coefficients (MFCC) • p(xt|qt) modeled with 4 mixtures of gaussians • p(at|qt) modeled with 1 gaussian

  13. N S E = log s2[n]w2[n] n=1 Baseline Experiments with Energy as an auxiliary variable System 1 WER Observed Energy Hidden Energy System 1 6.9 % 5.3 % System 2 6.1 % 5.6 % System 3 5.8 % 5.9 % Baseline 5.9 % System 2 System 3

  14. Conclusions • BNs are more flexible than HMMs. You can easily: • Change the topology of the distributions • Hide variables when necessary • Energy can improve the system performance if used in a non-traditional way

  15. Questions?

More Related