Text classiﬁcation from unlabeled documents with bootstrapping and feature projection techniques

Text classiﬁcation from unlabeled documents with bootstrappingand feature projection techniques Presenter : You Lin Chen Authors :Youngjoong Koa, Jungyun Seo b,* 2009.IPM.

Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments

Motivation Automobile Sport Travel ??? ??? ??? Training Classifier Text classifier which uses supervised learning method requires a lot of labeld document.

Objectives ??? ??? ??? classifier Title word Title word Title word This Paper propose a new text classification method. Use unlabeled data documents and the title word of each category for learning.

Methodology

Methodology http://www.cst.dk/online/pos_tagger/uk/index.html A sequence of 60 content words within a document is re- garded as the window size for one context.

Methodology Category ‘Autos’；title word ‘ car ’

Methodology 0.01 0.01 • context similarity • (…,engine,..,buy,car,have,,,…) (.,engine,..,sell,car,is,,,…) Centroid context (is,is,is,...car,is,is,is,…)

Methodology • Word similarity • (…,,..,X,car,price,,,…) (.,engine,..,X,car,price,,,…) • Assignment of remaining contexts to a category Word similarity (…,,..,X,car,price,,,…) (.,engine,..,X,car,price,,,…)

Methodology

Experiments

Conclusion Labeled data is expensive while unlabeled data is inexpensive and plentiful. This Paper proposed method is useful for low-cost text classiﬁcation.

Comments • Advantage • This idea is practice. • Drawback • Example is too less. • Application • Web Mining

Text classiﬁcation from unlabeled documents with bootstrapping and feature projection techniques