CIKM 200 8 , Napa Valley, California October 26-30, 2008

Michigan State University The Chinese University of Hong Kong Semi-supervised Text Categorization by Active Search Zenglin Xu1, Rong Jin2, Kaizhu Huang1, Michael R. Lyu1, and Irwin King1 2 Department of Computer Science and Engineering Michigan State University rongjin@cse.msu.edu 1 Department of Computer Science and Engineering The Chinese University of Hong Kong {zlxu, kzhuang, lyu, king}@cse.cuhk.edu.hk 1 Motivations 2 Contributions • A general framework for semi-supervised text categorization that collects the unlabeled documents via Websearch engines. • A novel discriminative query generation method • The categorization framework can significantly improve the classification accuracy. • Given a small number of labeled documents, it is very challenging to build a reliable classifier • .Unlabeled data are helpful in automated text categorization How to obtain unlabeled documents? • We can collect the unlabeled documents through search engines • Semi-supervised learning can take advantage of both the labeled documents and unlabeled documents 3 Framework & Model • Query generation: generate a query for every labeled document (document: (x,y), Vi: vocabulary for i-th document, w: word weights, ξ: margin error) • 2.Text Categorization Models • D: labeled documents, U: retrieved unlabeled documents • Auxiliary SVM (y* is the input) • Semi-supervised SVM (y* is an optimization variable) • Query generation that generates the textual queries for document retrieval • Document retrieval that retrieves the Web documents through the Web search engine • Semi-supervised text categorization utilizing both the labeled documents and the retrieved unlabeled Web documents 4 Experiment results • Data Repositories: 20-newsgroup, Reuters-21578, Ohsumed • Training data: 5 labeled documents in each category • Each documents generates one query • Each query returns 100 unlabeled documents • Auxi-SVM: Auxiliary SVM (Optimization : QP) • Semi-SVM: Semi-supervised SVM (Optimization: CCCP) • Search engine: Google • Accuracy improvement over SVM: • Auxi-SVM: 26% • Semi-SVM: 34% CIKM 2008, Napa Valley, California October 26-30, 2008

CIKM 200 8 , Napa Valley, California October 26-30, 2008

CIKM 200 8 , Napa Valley, California October 26-30, 2008

Presentation Transcript

4 nd Training Workshop 26 th October 8:30 -16:30

Innovation Napa Valley: Economic Outlook toward Innovation Napa , CA October 17, 2012

30 October 2008

October 30, 2008 WECC Meetings Marina Del Rey, California

October 26, 2008

October 30, 2008

October 26, 2008

October 8, 2008

October 8, 2008

2014 NAPA Valley EQ

October 26 – November 1, 2008

Napa Valley Kayak

Napa Valley Wine Tours

Limo Service Napa California

Napa valley california tours

Wedding Photographer Napa Valley

Napa Valley Transportation

Napa Valley Home Inspections

Napa Valley Home Inspections

Napa Valley Winery