1 / 14

Improving Text Classification using Local Latent Semantic Indexing

Improving Text Classification using Local Latent Semantic Indexing. Presenter : CHANG, SHIH-JIE Authors: Tao Liu , Zheng Chen, Benyu Zhang, Wei- ying Ma, Gongyi Wu 2004.ICDM. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

kyra-chavez
Download Presentation

Improving Text Classification using Local Latent Semantic Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Text Classification using Local Latent Semantic Indexing Presenter: CHANG, SHIH-JIE Authors: Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, Gongyi Wu 2004.ICDM.

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation • Global LSI ignores class discrimination. It has no help to improve the discrimination power of document classes, so it always yields no better on classification. • In Local LSI, due to the weighting problem, the improvement of classification performance very limited.

  4. Objectives • Propose new local LSI method(Local Relevancy Weighted LSI) to solve problem.

  5. Methodology - Local LSI • statistic (QS-CHI): measures the association between the term and the topic. • Mutual Information (QS-MI):measures how important a term to a topic.

  6. Methodology-Local Relevancy Weighted LSI LRW-LSI Training (1) initial classifier IC of topic c is used to assign initial relevancy score ( rs ) to each training document. (2) each training document is weighted. (3) the top n documents are selected to generate the local term-by-document matrix of the topic c. (4) a truncated SVD is performed to generate the local semantic space. (5) all other weighted training documents are folded into the new space. (6) all training documents in local LSI vector are used to train a real classifier RC of topic c .

  7. Methodology-Local Relevancy Weighted LSI

  8. Experiments

  9. Experiments

  10. Experiments

  11. Experiments

  12. Experiments

  13. Conclusions • LRW-LSI can improve the classification performance greatly using a much smaller dimension compared to the global LSI and local LSI methods.

  14. Comments • Advantages • LRW-LSI is quite effective. • Applications - Text Classification.

More Related