1 / 16

Improving Text Categorization Bootstrapping via Unsupervised Learning

Presenter : Bo- Sheng Wang Authors :ALFIO GLIOZZO, IDO DAGAN TSLP, 2009. Improving Text Categorization Bootstrapping via Unsupervised Learning. Outlines. Motivation Objectives Methodology Evaluation Experiments Conclusions Comments. Motivation.

lester
Download Presentation

Improving Text Categorization Bootstrapping via Unsupervised Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presenter : Bo-Sheng Wang Authors :ALFIO GLIOZZO, IDO DAGAN TSLP, 2009 Improving Text Categorization Bootstrapping via Unsupervised Learning

  2. Outlines Motivation Objectives Methodology Evaluation Experiments Conclusions Comments

  3. Motivation Supervised systems for text categorization requirelarge amounts of hand-labeled texts IL inherently suffers from a score scaling problem and very little information about the intension of a category.

  4. Objectives Investigate and improve two specific weaknesses that inherently affect the IL schema. Latent Semantic Index Gaussian Mixture Algorithm

  5. Methodology-Latent Semantic Index

  6. Vector Semantic Model

  7. Methodology-Latent Semantic Index

  8. Methodology-Latent Semantic Index

  9. Methodology-Gaussian Mixture Algorithm This paper propose mapping the similarity values into class posterior probabilities using unsupervised estimation of Gaussian mixtures.

  10. Methodology-Gaussian Mixture Algorithm

  11. Seeds

  12. Evaluation-Impact of LSI Similarity and GM on IL Performance

  13. Evaluation-Extensional vs. Intensional Learning A major of a comparison between IL and EL is the amount of supervision required to obtain level of performance.

  14. Experiments –

  15. Conclusions We obtained competitive performance using only the category names as initial seeds. Drastically reduce the number of seeds while significantly improving the performance.

  16. Comments • Advantages • Performance, • Disadvantage • Time • Applications • Text Mining

More Related