90 likes | 260 Views
University of Sheffield. M4 speech recognition. Vincent Wan , Martin Karafi á t. Trigram language model (SRILM). Word internal triphone models. Cross word triphone models. Lattice rescoring Time synchronous decoding (HTK). n -best lattice generation Best first decoding (Ducoder).
E N D
University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát
Trigram language model (SRILM) Word internal triphone models Cross word triphone models Lattice rescoring Time synchronous decoding (HTK) n-best lattice generation Best first decoding (Ducoder) MLLR adaptation (HTK) MLLR adaptation (HTK) Front end Recognition output Recognition output The Recogniser
System limitations • N-best list rescoring not optimal • Adaptation must be performed on two sets of acoustic models • Many more hyper-parameters to tune manually • SRILM is not efficient on very large language models (greater than 10e+9 words)
Advances since last meeting • Models trained on two databases • SWITCHBOARD recogniser • Acoustic & language models trained on 200 hours of speech • ICSI meetings recogniser • Acoustic models trained on 40 hours of speech • Language model is a combination of SWB and ICSI • Improvements mainly affect the Switchboard models • 16kHz sampling rate used throughout
Advances since last meeting • Adaptation of word internal context dependent models • Unified the phone sets and pronunciation dictionaries • Improved the pronunciation dictionary for Switchboard • Now using the ICSI dictionary with missing pronunciations imported from the ISIP dictionary • Better handling of multiple pronunciations during acoustic model training • General bug fixes
Results overview % word error rates * Results from lapel mics † Results from beam former
Results: adaptation vs. direct training on ICSI % word error rates * Results from Ducoder using all pruning
Acoustic model adaptation issue • Acoustic models are presently not very adaptive • Better MLLR code required (next slide) • More training data required • Need to make better use of the combined ICSI/SWB training data for M4.
Other news • The next version of HTK’s adaptation code will be made available to M4 before the official public release. • Sheffield to acquire HTK LVCSR decoder • Licensing issues to be resolved • May be able to make binaries available to M4 partners