220 likes | 424 Views
Posterior Probability of confidence measure. Reporter : CHEN TZAN HWEI. Reference . [1] F. K. Soong and W. K. Lo, “ GENERALIZED POSTERIOR PROBABILITY FOR MINIMUM ERROR VERIFICATION OF RECOGNIZED SENTENCES ” , ICASSP 2005
E N D
Posterior Probability of confidence measure Reporter : CHEN TZAN HWEI
Reference • [1] F. K. Soong and W. K. Lo, “GENERALIZED POSTERIOR PROBABILITY FOR MINIMUM ERROR VERIFICATION OF RECOGNIZED SENTENCES”, ICASSP 2005 • [2] J. Razik, O. Mella, D. Fohr and J.-. Haton, “Local Word Confidence Measure Using Word Graph and N-Best List”, EuroSpeech 2005 NTNU SPEECH LAB
Introduction (1/2) • Application of confidence measure • A speech translation system can put more weight on reliable words. • A spoken dialogue system to decide whether to prompt whether to prompt the user to speak the whole utterance again, or to confirm the uncertain part only. NTNU SPEECH LAB
Introduction (2/2) • Approaches proposed for measuring confidence of speech recognition output • Feature based : try to assess the confidence according to selected features (e.g. word duration, acoustic score etc.) • Explicit (extra) model based : employing a candidate model together with some competitor models. • Posterior probability based : trying to estimate the posterior probabilities of a recognized entity. NTNU SPEECH LAB
Generalized Word Posterior Probability (1/5) • In MAP-based speech recognition, the best recognized word string : • The word posterior probability (WPP) NTNU SPEECH LAB
Generalized Word Posterior Probability (2/5) • Three issue are addressed and the WPP is generalized • Reduced search space • Relax time registration • Optimal acoustic and language model weight NTNU SPEECH LAB
Generalized Word Posterior Probability (3/5) • Reduced search space : in LVCSR, the search space is always pruning to make the search tractable. The reduced search space can also be conveniently used when computing the WPP • Relaxed time registration : the starting and ending time of a word in LVCSR output can be affected by various factor NTNU SPEECH LAB
Generalized Word Posterior Probability (4/5) • Optimal acoustic and language model weights • Difference in the dynamic range • Difference in the frequency of computation • Independence assumption • Reduced search space NTNU SPEECH LAB
Generalized Word Posterior Probability (5/5) • The Generalized WPP • The Generalized UPP NTNU SPEECH LAB
Experiment (1/4) • Corpus : Chinese Basic Travel Expression Corpus • Feature : 12 MFCC, 12 delta MFCC and delta power. • The word recognition accuracy is 91% NTNU SPEECH LAB
Experiment (2/4) • Evaluation measure • Baseline was obtained NTNU SPEECH LAB
Experiment (3/4) Figure 1. Total error surfaces for word verification using GWPP Figure 2. total errors surfaces for utterance verification using GUPP NTNU SPEECH LAB
Experiment (4/4) Figure 3. total errors surfaces for utterance verification by using the product of GWPPs Figure 4. CER at word and utterance levels NTNU SPEECH LAB
Local word confidence measure – unigram based measure • The first measure uses only local and simple information • The constraint is too tied. So , we relax the timing constraint : NTNU SPEECH LAB
Local word confidence measure – bigram based measure (1/2) • Bigram information with previous word. • Bigram information with previous word and next word NTNU SPEECH LAB
Local word confidence measure – bigram based measure (2/2) • Using all possible previous or next word • may be too vast • Disturb the confidence measure with words that will never appear in any alternate sentence • In Eq. 8, Eq. 9 ,to calculate the bigram only with previous and next word belonging to the N-Best list NTNU SPEECH LAB
Experiment (1/4) • Corpus : French broadcast news • 56 minutes(10651 words) are used to tune parameter • 53 minutes(10841 words) for the test. • average recognition rate for both corpora is 70.9% • Evaluation metric : equal error rate NTNU SPEECH LAB
Experiment (2/4) • Relaxation rate : for an hypothesis word , we take into account words • Beginning time = • Ending time = • Length = Table 1: Results using unigram score NTNU SPEECH LAB
Experiment (3/4) Table 2: Evolution of score with different scale rates (unigram measure, relaxation rate = 0.2). Table 3: Results using previous word bigram NTNU SPEECH LAB
Experiment (4/4) Table 4: Results using previous and next word bigram Table 5: Results using previous word bigram with N-Best list NTNU SPEECH LAB
Summary • Local word CM can be computed directly in first pass decoding. • It may be used in CM-based pruning? NTNU SPEECH LAB