Implementation Details of the Text Classification Project

Implementation Details of the Text Classification Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo Spring 2001

Feature Selection Step • We select keywords from text by using some way of scoring words. Here, Information Gain is being used. • For each unique word, the number of documents in each class, in which the word occurs, is noted.

Feature Selection Step - Algorithm for each document d in training set for each word w if w has been encountered before increment the document count for Category(d) in record for w else create a new data record for w increment the document count for Category(d) in record for w for each word w using the record for w, calculate Information Gain Select NUM_KEYWORDS with highest Information Gain.

Feature Selection

Information Gain G (t) = - i=1 to m Pr (ci) log Pr (ci) + Pr(t) i=1 to m Pr (ci|t) log Pr (ci|t) + Pr(t) i=1 to m Pr (ci|t) log Pr (ci|t) Pr (ci) = 1/ 20 Pr (t) = (i=1 to m Catm(t)) / (i=1 to m j=1 to w Catm(j)) Pr (ci|t) = Catm (t) / i=1 to m Catm(t)

Classification Algorithm

Implementation Details of the Text Classification Project

Implementation Details of the Text Classification Project

Presentation Transcript

BOCC Implementation Details

Automatic Text Classification

Text Classification

PROJECT DETAILS

BCCS Implementation Details

Implementation of the project

TEXT CLASSIFICATION

Some details of implementation

SUPERVISED CLASSIFICATION OF TEXT DOCUMENTS

Text Classification

Text Classification

Text Classification

Text Classification

THE IMPLEMENTATION OF THE WATER RESOURCES CLASSIFICATION

Project 1: Text Classification by Neural Networks

Text Classification

eForms Project Details and Implementation

Text Classification

Classification Text

Text Classification

Project – Details of submission

TEXT CLASSIFICATION