90 likes | 216 Views
Experiments in Adaptive Language Modeling. Lidia Mangu & Geoffrey Zweig. Motivation. Multi-domain recognition IBM Superhuman Recognition Program Switchboard / Fisher Voicemail Call Center ICSI Meetings One-size LM may not fit all Even a gigantic LM. Lots of Past Work.
E N D
Experiments in Adaptive Language Modeling Lidia Mangu & Geoffrey Zweig
Motivation • Multi-domain recognition • IBM Superhuman Recognition Program • Switchboard / Fisher • Voicemail • Call Center • ICSI Meetings • One-size LM may not fit all • Even a gigantic LM
Lots of Past Work • Kneser & Steinbiss ’93 • On The Dynamic Adaptation of Stochastic Language Modeling” • Tune mixing weights to suit particular text • Chen, Gauvain, Lamel, Adda & Adda ’01 • “Language Model Adaptation for Broadcast News Transcription” • Build and add new LMs from relevant training data • Florian & Yarowsky ’99 – Hierarchical LMs • Gao, Li & Lee ’00 – Upweight training counts whose frequency is similar to that in test • Seymore & Rosenfeld ’97- Interpolate Topic LMs • Bacchiani & Roark ’03 – MAP adaptation for voicemail • Many others.
Plan of Attack • No adaptation: The Superhuman LM • 8-way LM from multiple domains • Baseline adaptation: Adjust interpolation weights per conversation • Extended adaptation: build new LM from relevant training data
Description of Atomic LMs • SWB + CallHome • 3.4M words, 1.4M 3-gms • Broadcast News • 148M words, 38M 3-gms • Financial Call Ceneters • 655K words, 303K 3-gms • UW Web data (conversational-like) • 192M words, 48M 3-gms • SWB Cellular • 244K words, 134K 3-gms • UW Web data (meeting-like) • 28M words, 12M 3-gms • UW Newsgroup data • 102M words, 34M 3-gms • Voicemail • 1.1M words, 551K 3-gms
Description of Lattice-Building Models & Process • Generate lattices with bigram LM • Word-internal acoustic context • 3.6K acoustic units; 142K gaussians • PLP + VTLN + FMLLR + MMI • LM rescoring w/ 8-way interpolated LM • Acoustic rescoring w/ cross-word AM • Cross-word AM • 10K acoustic units; 589K gaussians • PLP + VTLN + FMLLR + ML • Adapt on scripts of the last step • Adjust interpolation weights to minimize perplexity on decoded scripts
Conclusions • Simple adaptation effective for a multi-domain system • Contrasts some previous results on BN • Not very sensitive to initial decoding errors • Dynamic LM construction to be explored