120 likes | 369 Views
Automatic Speech Recognition System. Experimental Study Effect of parameter variation on WER Performance. Sanjay Patil, Jun-Won Suh Human and Systems Engineering. Details of the experiment. Details of the system: HMM Speech Recognition System TIDigits Database
E N D
Automatic Speech Recognition System Experimental Study Effect of parameter variation on WER Performance Sanjay Patil, Jun-Won Suh Human and Systems Engineering
Details of the experiment • Details of the system: • HMM Speech Recognition System • TIDigits Database • (41300 utterances, 12547 sentences), 11 words – zero to 9, O • Cross-word, loop grammar • Objective: • To study the ASR performance as a function of .. • WER = fn ( frame, Window, IP, State-tying) • Frame = 5 ms to 50 ms • Window = 5 ms to 50 ms • IP = -10 to -200 • State-tying = {split, merge, occupancy} => total # of tied states
Command line to run the experiment • tidigit_decode -model_type xwrd_triphone • -train_mode baum_welch • -decode_mode loop_grammar • options: • -model_type : [what type of model you want to build] • xwrd_triphone : context-dependent cross-word • triphone models • -train_mode : [specifies the training algorithm to use] • baum_welch : the standard Baum-Welch, • forward-backward algorithm • -decode_mode : [specifies the type of decoding to perform] • loop_grammar : decodes using a grammar where any • digit can follow any other digit with equal • probability
Language Model • Combining Acoustic and Language Models • Language Model contribution = P(W)LM IPN(W) • LM — language model scale [ we did not observe change in WER] • IP — Insertion Penalty – Penalty of inserting a new word. • IP is determined empirically to optimize the recognition performance
Test Results for varying Insertion Penalty on WER Same will be true for other combinations of Frame and Window pair. The remaining two are: (Frame, Window) pair (10, 25) and (15, 25)
State-Tying Results Ref. : Naveen’s Thesis. These results are from Naveen’s Thesis
References • J.Picone. “Lecture.” [online]. Available: http://www.isip.msstate.edu/publications/courses • X. Huang, A. Acero, H. Hon, Spoken Language Processing (Prentice Hall, 2001) • F. Jelinek, Statistical Methods for Speech Recognition (The MIT Press, 1999) • Naveen Parihar and J. Picone, “Aurora Working Group: DSR Front End LVCSR Evaluation – Baseline recognition System Description,” [online] Available: http://www.isip.msstate.edu/publications/ reports/aurora_frontend/2001/report_072101_v7.pdf. • Naveen Parihar, “Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation,” MS Thesis, Dec. 2003, [online] • J. Zhao, X. Zhang, A. Ganapathiraju, N. Deshmukh, J. Picone, “Tutorial for Decision Tree-Based State Tying for Acoustic Modeling,”, June 1999 [online] • S.J.Young, J.J.Odell, P.C.Woodland, “Tree-Based State Tying for high accuracy acoustic modelling,” May 1994. [online]. Available: http://citeseer.ist.psu.edu/young94treebased.html