180 likes | 293 Views
An Evaluation of Lattice Scoring using A Smoothed Estimate of Word Accuracy. Mohamed Kamel Omar and Lidia Mangu ICASSP 2007 IBM T.J. Watson Research Center. Outline. Introduction Problem formulation Implementation MSWA algorithm MSWA-CN algorithm Experiments and results Conclusions.
E N D
An Evaluation of Lattice Scoring using A Smoothed Estimate of Word Accuracy Mohamed Kamel Omar and Lidia Mangu ICASSP 2007 IBM T.J. Watson Research Center
Outline • Introduction • Problem formulation • Implementation • MSWA algorithm • MSWA-CN algorithm • Experiments and results • Conclusions
Introduction • In ASR systems, the maximum a posteriori probability is the standard decoding criterion • To minimize an estimate of the sentence-level error • Inconsistent with the evaluation metrics of ASR • The motivation of this paper is.. • To select a hypothesis which minimizes an estimate of the word error rate of the hypothesis lattice • To avoid the computational infeasibility of calculating the pair-wise Levenshtein distance between each two possible paths
Introduction • In LVCSR systems, it is commonly the case the word lattices are used as a compact representation of the alternative hypotheses • However, calculating pair-wise word error rates for different hypotheses in the lattice is computationally infeasible • [L. Mangu et al. 2000] Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks • [V. Goel et al. 2006] Segmental Minimum Bayes-Risk Decoding for Automatic Speech Recognition Systems • [F. Wessel et al. 2001] Explicit Word Error Minimization Using Word Hypothesis Posterior Probabilities
Formulation • Given two lattices, a measure of the word accuracy of the hypothesis lattice with respect to the reference lattice can be approximated by • is the expected value over the joint probability mass function (PMF) of the hypothesis word sequence, H, and the reference word sequence, R, given the observation vector Y • is the word accuracy of with respect to • is the posterior probability of the reference string estimated from the reference lattice • is the posterior probability of the hypothesis string estimated from the hypothesis lattice • is a smoothed approximation of the word accuracy of with respect to which takes phonetic similarity into consideration (1)
Maximum Smoothed Word Accuracy Approach • Our goal is to select the word sequence in the hypothesis lattice which maximizes the estimate of the word accuracy -- (MSWA) • This word sequence can be estimated using the Viterbi algorithm • Alternatively we can assign to each word arc, , in the hypothesis lattice, the conditional value of the objective function in eq(1) given this word arc, that is (2) (3)
師範 台灣 國立 吃飯 大學 排 彎 SIL SIL 吃飯 大學 鍋 大學 颱風 司法 粒 吃飯 鍋 粒 排 彎 SIL SIL 台灣 師範 大學 國立 颱風 司法 Smoothed Estimate of Word Accuracy • The approximate measure of the accuracy of a word arc , in the hypothesis lattice with respect to a path , which start or end in the middle of an arc in the reference lattice such that it coincides with in time, is Hyp Ref 颱 風 t_a f_e ai eng 台 灣 師 t_a sic_u sh_empt ai uan
Implementation • Two requirements (for , , ): • The forward-backward algorithm has to be applied to the reference lattice • The state sequence for each arc in the reference lattice has to be known • In this paper, two approaches were used to estimate the hypothesis which will approximately maximize the objective function • The Viterbi-based MSWA algorithm were used to estimate the word sequence according eq(2) • The MSWA-CN algorithm, which is based on the CN algorithm, were used to estimate the best word sequence using the conditional values in eq(3)
師範 台灣 國立 吃飯 大學 排 彎 SIL SIL 吃飯 大學 鍋 大學 颱風 司法 粒 吃飯 鍋 粒 排 彎 SIL SIL 台灣 師範 大學 國立 颱風 司法 Viterbi-Based MSWA Algorithm • 1. Initialization: for each starting arc in the hypothesis lattice, • 2. Forward Propagation: the update equations of the viterbi algorithm for each non-starting arc, , is
Viterbi-Based MSWA Algorithm • 3. Backtracking: • 4. Set and exit with the output word sequence
MSWA-CN Algorithm • 1. Initialization: • For each starting arc in the hypothesis lattice, • For each ending arc in the hypothesis lattice, • 2. Forward Propagation: the update equations of the forward propagation part of the algorithm for each non-starting arc, , is
MSWA-CN Algorithm • 3. Backward Propagation: the update equations of the backward propagation part of the algorithm for each non-ending arc, , is • 4. For each word arc in the hypothesis lattice • The confusion network algorithm is then used to find the best path in the hypothesis lattice after replacing the word posterior probability in the original algorithm with
Experimental Setup • Corpus • DARPA 2004 Rich Transcription evaluation data (RT04) • 2005 broadcast news Arabic test set (BNAT05) • Two systems • unvowelized system • vowelized system • Feature extraction • LDA+MLLT • Acoustic Modeling (penta-phone) • 4000K Gaussians trained with a combination fMPE and MPE • Language Modeling (Lexicon size:617K) • 4-gram LM trained with modified Kneser-Ney smoothing
Experimental Results The main difference between the two systems is the explicit modeling by the vowelized system of the short vowels which are pronounced in Arabic but almost never transcribed.
Conclusions • In this paper, a new smoothed word accuracy (SWA) objective function for lattice scoring have been examined • Two algorithms which use the SWA objective function to estimate the best hypothesis in a given lattice have been described: • The Viterbi-based MSWA algorithm • The MSWA-CN algorithm • In the future, the authors will intend to assess the usefulness of the conditional score in eq(3) for confidence annotation