170 likes | 291 Views
A Survey of ICASSP 2013 Language Model. Department of Computer Science & Information Engineering National Taiwan Normal University. 報告者:郝柏翰. Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition.
E N D
A Survey of ICASSP 2013Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者:郝柏翰
Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition EbruArısoy et al., IBM T.J. Watson Research Center, NY
Introduction • In this work, we propose an approximate method for converting a feedforward NNLM into a back-off n-gram language model that can be used directly in existing LVCSR decoders. • We convert NNLMs of increasing order to pruned back-off language models, using lower-order models to constrain the n-grams allowed in higher-order models.
Method • A back-off n-gram language model takes the form • In this paper, we propose an approximate method for converting a feedforward NNLM into a back-off language model that can be directly used in existing state-of-the-art decoders. • where and represent the NNLM and background language model probabilities
Method • To represent NNLM probabilities exactly over the output vocabulary requires parameters in general, where V is the complete vocabulary. • While we can represent the overall NNLM as a back-off model exactly, it is prohibitively large as noted above. The technique of pruning can be used to reduce the set of n-grams for which we explicitly store probabilities
Experiments More smooth Before After
Use of Latent Words Language Models in ASR: a Sampling-Based Implementation Ryo Masumura et al., NTT Media Intelligence Laboratories, Japan
Introduction • This paper applies the latent words language model (LWLM) to automatic speech recognition (ASR). LWLMs are trained taking into account related words, i.e., grouping of similar words in terms of meaning and syntactic role. • In addition, this paper also describes an approximation method of the LWLM for ASR, in which words are randomly sampled on the LWLM and then a standard word n-gram language model is trained.
Method • Hierarchical Pitman-Yor Language Model • If we directly implement LWLM to one-pass decoding, we have to calculate the probability distribution over current word given context • Latent Words Language Model • LWLMs are generative models with a latent variable for every observed word in a text.
Method • The latent variable, called latent word , is generated by its context and observed word is generated from latent word
Expriments • This result shows that we can construct LWLM comparable to HPYLM if we generate sufficient text data. Moreover, highest performance was achieved with LWLM+HPYLM. This results shows that LWLM possesses properties different from those of the HPYLM, and further improvement is achieved if they are combined.
Incorporating Semantic Information to Selection of WEB Texts for Language Model of Spoken Dialogue System KoichiroYoshino et al., Kyoto University, Japan
Introduction • A novel text selection approach for training a language model (LM) with Web texts is proposed for automatic speech recognition (ASR) of spoken dialogue systems. • Compared to the conventional approach based on perplexity criterion, the proposed approach introduces a semantic-level relevance measure with the back-end knowledge base used in the dialogue system. • We focus on the predicate-argument (P-A) structure characteristic to the domain in order to filter semantically relevant sentences in the domain.
Method • Selection Based on Perplexity • For a sentence , its perplexity by a seed LM trained with the document set D is defined by • Selection Based on Semantic Relevance Measure where C(.) stands for an occurrence count and P(D) is a normalization factor determined by the size of D. γ is a smoothing factor estimated with a Dirichlet prior
Method • For a P-A pair consisting of and , we define as a geometric mean of and • For each sentence , we compute a mean of for P-A pairs included in the sentence, defined as .