Bayesian Document Classification: Learning and Prediction

Learning to Classify DocumentsEdwin ZhangComputer Systems Lab 2009-2010

Abstract • Classifying documents • Will use a Bayesian method and calculate conditional probability • Use a set of Training Documents • Choose a set of features

Introduction • Learning to Classify Documents • Use a Bayesian Method • Code in Python/Java

Background • Naïve Bayes Classifier/Bayesian Method • computes the conditional probability p(T|D) for a given document D for every topic • Assigns the document D to the topic with the largest conditional probability http://nltk.googlecode.com/svn/trunk/doc/book/ch06.html

Development • Program has two steps: • Learning • Prediction • Learning • training documents • conditional probability • features selection http://www.dot.state.mn.us/consult/images/j0341469.jpg

Development • Prediction • Predicting what a unknown document is talking about based on prediction section http://www.deafsports.co.nz/WebImages/documents.jpg

Expected Results • Initially, the program may have trouble classifying documents into the correct category • As the program learns more and improves its formulas, it will get better at classifying documents into the correct categories.

Works Cited • http://www.nltk.org/book • My dad • Eyheramendy, Susana, and David Madigan. "A Flexible Bayesian Generalized Linear Model for Dichotomous Response Data with an Application to Text Categorization." Lecture Notes-Monograph Series 54 (2007): 76-91. JSTOR. Web. 25 Oct. 2009. <http://www.jstor.org/stable/20461460>.

Bayesian Document Classification: Learning and Prediction

Bayesian Document Classification: Learning and Prediction

Presentation Transcript

Computer Systems Lab TJHSST

Computer Systems Lab TJHSST

Learning to Classify Text

Computer Systems Lab TJHSST

Computer Systems Lab TJHSST

ECE354 - Computer Systems Lab II

Welcome to the Computer Systems Lab tjhsst/compsci

Intro to computer (LAB)

Computer Systems 2009-2010

Computer Systems 2009-2010

Simulating Traffic Congestion on Route 1 Jimmy O'Hara Computer Systems Lab 2009-2010

Active Learning to Classify Email

Learning to Classify Documents with Only a Small Positive Training Set

Graphics Engine Creation Joseph Hallahan Computer Systems Lab 2009-2010

Computer Systems 2009-2010

Graphics/Physics Engine Creation Joseph Hallahan Computer Systems Lab 2009-2010

Computer Systems Lab TJHSST

The Computer Systems Lab

Computer Systems Lab TJHSST

Computer Systems Lab Research at TJHSST

Enhancing the Enlargement of Images Tara Naughton Computer Systems Lab 2009-2010

Title of Project Joseph Hallahan Computer Systems Lab 2009-2010