140 likes | 154 Views
This paper presents a novel approach to text classification that leverages unlabeled documents and category-specific title words for learning. By utilizing these resources, the proposed method aims to reduce the reliance on labeled data, making text classification more cost-effective. The methodology involves utilizing a sequence of context words within a document, word similarity, and centroid context to assign texts to appropriate categories. The experimental results demonstrate the effectiveness of this approach, highlighting its potential for low-cost text classification in various applications such as web mining. While the method shows promise, further examples and refinements could enhance its practicality and efficiency.
E N D
Text classification from unlabeled documents with bootstrappingand feature projection techniques Presenter : You Lin Chen Authors :Youngjoong Koa, Jungyun Seo b,* 2009.IPM.
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
Motivation Automobile Sport Travel ??? ??? ??? Training Classifier Text classifier which uses supervised learning method requires a lot of labeld document.
Objectives ??? ??? ??? classifier Title word Title word Title word This Paper propose a new text classification method. Use unlabeled data documents and the title word of each category for learning.
Methodology http://www.cst.dk/online/pos_tagger/uk/index.html A sequence of 60 content words within a document is re- garded as the window size for one context.
Methodology Category ‘Autos’;title word ‘ car ’
Methodology 0.01 0.01 • context similarity • (…,engine,..,buy,car,have,,,…) (.,engine,..,sell,car,is,,,…) Centroid context (is,is,is,...car,is,is,is,…)
Methodology • Word similarity • (…,,..,X,car,price,,,…) (.,engine,..,X,car,price,,,…) • Assignment of remaining contexts to a category Word similarity (…,,..,X,car,price,,,…) (.,engine,..,X,car,price,,,…)
Conclusion Labeled data is expensive while unlabeled data is inexpensive and plentiful. This Paper proposed method is useful for low-cost text classification.
Comments • Advantage • This idea is practice. • Drawback • Example is too less. • Application • Web Mining