80 likes | 101 Views
Learning to Classify Documents Edwin Zhang Computer Systems Lab 2009-2010. Abstract. Classifying documents Will use a Bayesian method and calculate conditional probability Use a set of Training Documents Choose a set of features. Introduction. Learning to Classify Documents
E N D
Learning to Classify DocumentsEdwin ZhangComputer Systems Lab 2009-2010
Abstract • Classifying documents • Will use a Bayesian method and calculate conditional probability • Use a set of Training Documents • Choose a set of features
Introduction • Learning to Classify Documents • Use a Bayesian Method • Code in Python/Java
Background • Naïve Bayes Classifier/Bayesian Method • computes the conditional probability p(T|D) for a given document D for every topic • Assigns the document D to the topic with the largest conditional probability http://nltk.googlecode.com/svn/trunk/doc/book/ch06.html
Development • Program has two steps: • Learning • Prediction • Learning • training documents • conditional probability • features selection http://www.dot.state.mn.us/consult/images/j0341469.jpg
Development • Prediction • Predicting what a unknown document is talking about based on prediction section http://www.deafsports.co.nz/WebImages/documents.jpg
Expected Results • Initially, the program may have trouble classifying documents into the correct category • As the program learns more and improves its formulas, it will get better at classifying documents into the correct categories.
Works Cited • http://www.nltk.org/book • My dad • Eyheramendy, Susana, and David Madigan. "A Flexible Bayesian Generalized Linear Model for Dichotomous Response Data with an Application to Text Categorization." Lecture Notes-Monograph Series 54 (2007): 76-91. JSTOR. Web. 25 Oct. 2009. <http://www.jstor.org/stable/20461460>.