160 likes | 247 Views
Presenter : Bo- Sheng Wang Authors :ALFIO GLIOZZO, IDO DAGAN TSLP, 2009. Improving Text Categorization Bootstrapping via Unsupervised Learning. Outlines. Motivation Objectives Methodology Evaluation Experiments Conclusions Comments. Motivation.
E N D
Presenter : Bo-Sheng Wang Authors :ALFIO GLIOZZO, IDO DAGAN TSLP, 2009 Improving Text Categorization Bootstrapping via Unsupervised Learning
Outlines Motivation Objectives Methodology Evaluation Experiments Conclusions Comments
Motivation Supervised systems for text categorization requirelarge amounts of hand-labeled texts IL inherently suffers from a score scaling problem and very little information about the intension of a category.
Objectives Investigate and improve two specific weaknesses that inherently affect the IL schema. Latent Semantic Index Gaussian Mixture Algorithm
Methodology-Gaussian Mixture Algorithm This paper propose mapping the similarity values into class posterior probabilities using unsupervised estimation of Gaussian mixtures.
Evaluation-Impact of LSI Similarity and GM on IL Performance
Evaluation-Extensional vs. Intensional Learning A major of a comparison between IL and EL is the amount of supervision required to obtain level of performance.
Conclusions We obtained competitive performance using only the category names as initial seeds. Drastically reduce the number of seeds while significantly improving the performance.
Comments • Advantages • Performance, • Disadvantage • Time • Applications • Text Mining