240 likes | 346 Views
Using Neural Network Language Models for LVCSR. Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004. Introduction. Build and use neural networks to estimate LM posterior probabilities for ASR tasks Idea:
E N D
Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004
Introduction • Build and use neural networks to estimate LM posterior probabilities for ASR tasks • Idea: • Project word indices onto continuous space • Resulting smooth prob fns of word representations generalize better to unknown ngrams • Still an n-gram approach, but posteriors interpolated for any poss. context; no backing off • Result: significant WER reduction with small computational costs Using Neural Network LMs for LVCSR
output layer Input projection layer p1= P(wj=1| hj) oi hiddenlayer wj-n+1 ck pi= P(wj=i| hj) N M dj V hj wj-n+2 N b k H≈1k P=50 wj-1 pN= P(wj=N| hj) N = 51k N ArchitectureStandard fully connected multilayer perceptron Using Neural Network LMs for LVCSR
Architecture oi pi= P(wj=i| hj) ck M dj V b k H P d = tanh(M*c+b) pN= P(wj=N| hj) N o = tanh(V*d+k) Using Neural Network LMs for LVCSR
Training • Train with std back propagation algorithm • Error fn: cross entropy • Weight decay regularization used • Targets set to 1 for wj and to 0 otherwise • These outputs shown to cvg to posterior probs • Back-prop through projection layer NN learns best projection of words onto continuous space for prob estimation task Using Neural Network LMs for LVCSR
Fast Recognition Techniques • Lattice Rescoring • Shortlists • Regrouping • Block mode • CPU optimization Using Neural Network LMs for LVCSR
Fast Recognition Techniques • Lattice Rescoring • Decode with std backoff LM to build lattices • Shortlists • Regrouping • Block mode • CPU optimization Using Neural Network LMs for LVCSR
Redistributes probability mass of shortlist words Fast Recognition Techniques • Lattice Rescoring • Shortlists • NN only predicts high freq subset of vocab • Regrouping • Block mode • CPU optimization Using Neural Network LMs for LVCSR
Shortlist optimization oi pi= P(wj=i| hj) ck M dj V pS= P(wj=S| hj) k b H P N Using Neural Network LMs for LVCSR
Fast Recognition Techniques • Lattice Rescoring • Shortlists • Regrouping– Optimization of #1 • Collect and sort LM prob requests • All prob requests with same ht:only one fwd pass necessary • Block mode • CPU optimization Using Neural Network LMs for LVCSR
Fast Recognition Techniques • Lattice Rescoring • Shortlists • Regrouping • Block mode • Several examples propagated through NN at once • Takes advantage of faster matrix operations • CPU optimization Using Neural Network LMs for LVCSR
Block mode calculations oi ck M dj V b k H P d = tanh(M*c+b) N o = tanh(V*d+k) Using Neural Network LMs for LVCSR
Block mode calculations O C M D V b k D = tanh(M*C+B) O = (V*D+K) Using Neural Network LMs for LVCSR
Fast Recognition – Test Results Techniques • Lattice Rescoring – ave 511 nodes • Shortlists (2000)– 90% prediction coverage • 3.8M 4gms req’d, 3.4M processed by NN • Regrouping – only 1M fwd passes req’d • Block mode – bunch size=128 • CPU optimization Total processing < 9min (0.03xRT) Without optimizations, 10x slower Using Neural Network LMs for LVCSR
Fast Training Techniques • Parallel implementations • Full connections req low latency; very costly • Resampling techniques • Optimum floating pt operations best with continuous memory locations Using Neural Network LMs for LVCSR
Fast Training Techniques • Floating point precision – 1.5x faster • Suppress internal calcs – 1.3x faster • Bunch mode – 10+x faster • Fwd + back propagation for many examples at once • Multiprocessing – 1.5x faster 47 hours 1h27m with bunch size 128 Using Neural Network LMs for LVCSR
Application to ASR • Neural net LM techniques focus on CTS bc • Far less in-domain training data data sparsity • NN can only handle sm amount of training data • New Fisher CTS data – 20M words (vs 7M) • BN data: 500M words Using Neural Network LMs for LVCSR
Application to CTS • Baseline: Train standard backoff LMs for each domain and then interpolate • Expt #1: Interpolate CTS neural net with in-domain back-off LM • Expt #2: Interpolate CTS neural net with full data back-off LM Using Neural Network LMs for LVCSR
Application to CTS - PPL • Baseline: Train standard backoff LMs for each domain and then interpolate • In-domain PPL: 50.1Full data PPL: 47.5 • Expt #1: Interpolate CTS neural net with in-domain back-off LM • In-domain PPL: 45.5 • Expt #2: Interpolate CTS neural net with full data back-off LM • Full data PPL: 44.2 Using Neural Network LMs for LVCSR
Application to CTS - WER • Baseline: Train standard backoff LMs for each domain and then interpolate • In-domain WER: 19.9Full data WER: 19.3 • Expt #1: Interpolate CTS neural net with in-domain back-off LM • In-domain WER: 19.1 • Expt #2: Interpolate CTS neural net with full data back-off LM • Full data WER: 18.8 Using Neural Network LMs for LVCSR
Application to BN • Only subset of 500M available words could be used for training – 27M train set • Still useful: • NN LM gave 12% PPL gain over backoff on small 27M set • NN LM gave 4% PPL gain over backoff on full 500M word training set • Overall WER reduction of 0.3% absolute Using Neural Network LMs for LVCSR
Conclusion • Neural net LM provide significant improvements in PPL and WER • Optimizations can speed NN training by 20x and lattice rescoring in less than 0.05xRT • While NN LM was developed for and works best with CTS, gains found in BN task too Using Neural Network LMs for LVCSR