1 / 14

Text classification from unlabeled documents with bootstrapping and feature projection techniques

This paper presents a novel approach to text classification that leverages unlabeled documents and category-specific title words for learning. By utilizing these resources, the proposed method aims to reduce the reliance on labeled data, making text classification more cost-effective. The methodology involves utilizing a sequence of context words within a document, word similarity, and centroid context to assign texts to appropriate categories. The experimental results demonstrate the effectiveness of this approach, highlighting its potential for low-cost text classification in various applications such as web mining. While the method shows promise, further examples and refinements could enhance its practicality and efficiency.

lyndonm
Download Presentation

Text classification from unlabeled documents with bootstrapping and feature projection techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text classification from unlabeled documents with bootstrappingand feature projection techniques Presenter : You Lin Chen Authors :Youngjoong Koa, Jungyun Seo b,* 2009.IPM.

  2. Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments

  3. Motivation Automobile Sport Travel ??? ??? ??? Training Classifier Text classifier which uses supervised learning method requires a lot of labeld document.

  4. Objectives ??? ??? ??? classifier Title word Title word Title word This Paper propose a new text classification method. Use unlabeled data documents and the title word of each category for learning.

  5. Methodology

  6. Methodology http://www.cst.dk/online/pos_tagger/uk/index.html A sequence of 60 content words within a document is re- garded as the window size for one context.

  7. Methodology Category ‘Autos’;title word ‘ car ’

  8. Methodology 0.01 0.01 • context similarity • (…,engine,..,buy,car,have,,,…) (.,engine,..,sell,car,is,,,…) Centroid context (is,is,is,...car,is,is,is,…)

  9. Methodology • Word similarity • (…,,..,X,car,price,,,…) (.,engine,..,X,car,price,,,…) • Assignment of remaining contexts to a category Word similarity (…,,..,X,car,price,,,…) (.,engine,..,X,car,price,,,…)

  10. Methodology

  11. Methodology

  12. Experiments

  13. Conclusion Labeled data is expensive while unlabeled data is inexpensive and plentiful. This Paper proposed method is useful for low-cost text classification.

  14. Comments • Advantage • This idea is practice. • Drawback • Example is too less. • Application • Web Mining

More Related