180 likes | 207 Views
Introducing Local Linear Matrix Factorization (LLMF) for document modeling, enhancing low dimensional representations while preserving local geometric relations. The method reduces overfitting, improves generalization, and avoids similarity biases. LLMF outperforms NMF, PLSA, and LDA in document classification tasks, showing superior accuracy on benchmark datasets. The approach leverages geometric relationships among documents, offering better semantic representations than existing methods. Future work aims to extend LLMF to parallel and distributed settings and explore its application in recommendation systems.
E N D
Local Linear Matrix Factorization for Document Modeling • Lu Bai, JiafengGuo, YanyanLan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn
Outline • Introduction • Our approach • Experimental results • Conclusion
Background • The low dimensional representations can be produced from decomposing the document-word matrix into low rank matrices • Preserving local geometric relations can improve the low dimensional representation • Smoothing the low dimensional representation • Improving the model’s generalization • Avoiding over fitting
Previous work A new low dimensional representation mining method by better exploiting the geometric relationship among documents
Our approach • Basic ideas
Local Linear Matrix Factorization(LLMF) • Factorizing the document-term matrix as NMF • ,are used for reducing over-fitting • Factorizing the matrix with neighbors • denotes the normalized document-word matrix • , avoids the bias of long documents • denotes the linear combination weight • weights the norm of • Picking document neighbors • Learning salient combination weights min min
Cont’ • Combining matrix factorization and local neighbor factorization , , • Final object function min
LLMF vs Others • Comparing models without geometric information • E.g. NMF, PLSA, LDA • LLMF smoothes document representation with its neighbors • Comparing models with geometric constraints • E.g. LapPLSA, LTM • LLMF is free of similarity measure and neighborhood threshold • LLMF is more robust in preserving local geometric structure in unbalanced data distribution
Model fitting • Estimating firstly • Not differentiable, because of the norm • OWL-QN • Estimating , • are bi-convex on • Coordinate gradient descent
Experimental Settings • Data set • 20news & la1(from Weka) • Word Stemming • Stop words removing
Cont’ • Baseline method • PLSA, LDA, NMF, LapPLSA • Parameter setting • Low Dimension • ,, for norm • for norm • Document classification • Libsvm, linear kernel • Training set : testing set = 3:2
Cont’ • Document classification • LapPLSA and LLMF are better than NMF, PLSA, LDA • LLMF achieves highest accuracy than all baseline methods in both datasets • LLMF with different s is consistently better than pure NMF
Conclusion • Conclusions • We propose a novel method, namely LLMF for learning low dimensional representations of document with local linear constraints. • LLMF can better capture the rich geometric information among documents than those based on independent pairwise relationships. • Experiments on benchmark of 20news and la1 show the proposed approach can learn better semantic representations compared to other baseline methods • Future works • We would extend LLMF to paralleled and distributed settings • It is promising to apply LLMF in recommendation systems
References • D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty. Latent dirichletallocation. JMLR, 3:2003, 2003. • D. Cai, X. He, and J. Han. Locally consistent concept factorization for document clustering. TKDE, 23(6):902–913,2011 • D. Cai, Q. Mei, J. Han, and C. Zhai. Modeling hidden topics on document manifold. CIKM ’08, 911–920,, NY, USA, 2008. ACM • T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. In Machine Learning, page 2001, 2001 • S. Huh and S. E. Fienberg. Discriminative topic modeling based on manifold learning. KDD ’10, pages 653–662, New York, NY, USA, 2010. ACM
Thanks!! Q&A