480 likes | 658 Views
Bayesian Learning for Latent Semantic Analysis. Presenter: Hsuan-Sheng Chiu. Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu. Reference. Chia-Sheng Wu, “ Bayesian Latent Semantic Analysis for Text Categorization and Information Retrieval ” , 2005
E N D
Bayesian Learning for Latent Semantic Analysis Presenter: Hsuan-Sheng Chiu Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu
Reference • Chia-Sheng Wu, “Bayesian Latent Semantic Analysis for Text Categorization and Information Retrieval”, 2005 • Q. Huo and C.-H. Lee, “On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate”, 1997 Speech Lab. NTNU
Outline • Introduction • PLSA • ML (Maximum Likelihood) • MAP (Maximum A Posterior) • QB (Quasi-Bayes) • Experiments • Conclusions Speech Lab. NTNU
Introduction • LSA vs. PLSA • Linear algebra and probability • Semantic space and latent topics • Batch learning vs. Incremental learning Speech Lab. NTNU
PLSA • PLSA is a general machine learning technique, which adopts the aspect model to represent the co-occurrence data. • Topics (hidden variables) • Corpus (document-word pairs) Speech Lab. NTNU
PLSA • Assume that di and wj are independent conditionally on the mixture of associated topic zk • Joint probability: Speech Lab. NTNU
ML PLSA • Log likelihood of Y: • ML estimation: Speech Lab. NTNU
ML PLSA • Maximization: Speech Lab. NTNU
ML PLSA • Complete data: • Incomplete data: • EM (Expectation-Maximization) Algorithm • E-step • M-step Speech Lab. NTNU
ML PLSA • E-Step Speech Lab. NTNU
ML PLSA • Auxiliary function: • And Speech Lab. NTNU
ML PLSA • M-step: • Lagrange multiplier Speech Lab. NTNU
ML PLSA • Differentiation • New parameter estimation: Speech Lab. NTNU
MAP PLSA • Estimation by Maximizing the posteriori probability: • Definition of prior distribution: • Dirichlet density: • Prior density: Kronecker delta Assume and are independent Speech Lab. NTNU
MAP PLSA • Consider prior density: • Maximum a Posteriori: Speech Lab. NTNU
MAP PLSA • E-step: • expectation • Auxiliary function: Speech Lab. NTNU
MAP PLSA • M-step • Lagrange multiplier Speech Lab. NTNU
MAP PLSA • Auxiliary function: Speech Lab. NTNU
MAP PLSA • Differentiation • New parameter estimation: Speech Lab. NTNU
QB PLSA • It needs to update continuously for an online information system. • Estimation by maximize the posteriori probability: • Posterior density is approximated by the closest tractable prior density with hyperparameters • As compared to MAP PLSA, the key difference using QB PLSA is due to the updating of hyperparameters. Speech Lab. NTNU
QB PLSA • Conjugate prior: • In Bayesian probability theory, a conjugate prior is a prior distribution which has the property that the posterior distribution is the same type of distribution. • A close-form solution • A reproducible prior/posteriori pair for incremental learning Speech Lab. NTNU
QB PLSA • Hyperparameter α: Speech Lab. NTNU
QB PLSA • After careful arrangement, exponential of posteriori expectation function can be expressed: • A reproducible prior/posterior pair is generated to build the updating mechanism of hyperparameters Speech Lab. NTNU
Initial Hyperparameters • A open issue in Bayesian learning • If the initial prior knowledge is too strong or after a lot of adaptation data have been incrementally processed, the new adaptation data usually have only a small impact on parameters updating in incremental training. Speech Lab. NTNU
Experiments • MED Corpus: • 1033 medical abstracts with 30 queries • 7014 unique terms • 433 abstracts for ML training • 600 abstracts for MAP or QB training • Query subset for testing • K=8 • Reuters-21578 • 4270 documents for training • 2925 for QB learning • 2790 documents for testing • 13353 unique words • 10 categories Speech Lab. NTNU
Experiments Speech Lab. NTNU
Experiments Speech Lab. NTNU
Experiments Speech Lab. NTNU
Conclusions • This paper presented an adaptive text modeling and classification approach for PLSA based information system. • Future work: • Extension of PLSA for bigram or trigram will be explored. • Application for spoken document classification and retrieval Speech Lab. NTNU
Discriminative Maximum Entropy Language Model for Speech Recognition Chuang-Hua Chueh, To-Chang Chien and Jen-Tzung Chien Presenter: Hsuan-Sheng Chiu
Reference • R. Rosenfeld, S. F. Chen and X. Zhu, “Whole-sentence exponential language models : a vehicle for linguistic statistical integration”, 2001 • W.H. Tsai, “An Initial Study on Language Model Estimation and Adaptation Techniques for Mandarin Large Vocabulary Continuous Speech Recognition”, 2005 Speech Lab. NTNU
Outline • Introduction • Whole-sentence exponential model • Discriminative ME language model • Experiment • Conclusions Speech Lab. NTNU
Introduction • Language model • Statistical n-gram model • Latent semantic language model • Structured language model • Based on maximum entropy principle, we can integrate different features to establish optimal probability distribution. Speech Lab. NTNU
Whole-Sentence Exponential Model • Traditional method: • Exponential form: • Usage: • When used for speech recognition, the model is not suitable for the first pass of the recognizer, and should be used to re-score N-best lists. Speech Lab. NTNU
Whole-Sentence ME Language Model • Expectation of feature function: • Empirical: • Actual: • Constraint: Speech Lab. NTNU
Whole-Sentence ME Language Model • To Solve the constrained optimization problem: Speech Lab. NTNU
GIS algorithm Speech Lab. NTNU
Discriminative ME Language Model • In general, ME can be considered as a maximum likelihood model using log-linear distribution. • Propose a Discriminative language model based on whole-sentence ME model (DME) Speech Lab. NTNU
Discriminative ME Language Model • Acoustic features for ME estimation: • Sentence-level log-likelihood ratio of competing and target sentences • Feature weight parameter: • Namely, we activate feature parameter to be one for those speech signals observed in training database Speech Lab. NTNU
Discriminative ME Language Model • New estimation: • Upgrade to discriminative linguistic parameters Speech Lab. NTNU
Discriminative ME Language Model Speech Lab. NTNU
Experiment • Corpus: TCC300 • 32 mixtures • 12 Mel-frequency cepstral coefficients • 1 log-energy and first derivation • 4200 sentences for training, 450 for testing • Corpus: Academia Sinica CKIP balanced corpus • Five million words • Vocabulary 32909 words Speech Lab. NTNU
Experiment Speech Lab. NTNU
Conclusions • A new ME language model integrating linguistic and acoustic features for speech recognition • The derived ME language model was inherent with discriminative power. • DME model involved a constrained optimization procedure and was powerful for knowledge integration. Speech Lab. NTNU
Relation between DME and MMI • MMI criterion: • Modified MMI criterion: • Express ME model as ML model: Speech Lab. NTNU
Relation between DME and MMI • The optimal parameter: Speech Lab. NTNU
Relation between DME and MMI Speech Lab. NTNU
Relation between DME and MMI Speech Lab. NTNU