Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teeva

Tackling the Poor Assumptions of Naive Bayes Text ClassifiersPubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger Liang Lan 11/19/2007

Outline • Introduce the Multinomial Naive Bayes Model for Text Classification. • The Poor Assumption of Multinomial Naive Bayes Model. • Solutions to some problem of the Naive Bayes Classifier.

Multinomial Naive Bayes Model for Text Classification • Given: • A description of the document d: f = (f1,…..,fn) fi is the frequency count of word i occurring in document d • A fixed number of classes: C ={1, 2,…, m}, Parameter Vector for each class The parameter vector for a class c is ci is the probability of word i occurs in class c • Determine: • The class label of d. θ

Introduce the Multinomial Naive Bayes Model for Text Classification • The likelihood of a document is a product of the parameters of the words that appear in the document. • Selecting the class with the largest posterior probability

Parameter Estimation for Naive Bayes Model • The parameters θci must be estimated from the training data. • Then, we get the MNB classifier. • For simplicity, we use uniform prior estimate. • lMNB(d) = argmaxc(fiwci)

The Poor Assumption of Multinomial Naive Bayes Model • Two systemic errors (Occurring in any naive bayes classifier ) 1. Skewed Data Bias ( uneven training size) 2. Weight Magnitude Errors (Caused by the independence assumption) • The Multinomial does not model the text well

Correcting the skewed data bias • More training examples for one class than another --- can cause the classifier to prefer one class over the other. • Using Complement Naive Bayes • Nci is the number of times word i occurred in documents in classes other than c. ~

Correcting the Weight Magnitude Errors • Caused by the independence assumption Ex. “San Francisco” , “Boston” • Normalizing the Weight Vectors We call this Weight-normalized Complement Naive Bayes(WCNB).

Modeling Text Better • Transforming Term Frequency • Transforming by Document Frequency • Transforming Based on Length

Transforming Term Frequency • The term distribution had heavier tails than predicted by the multinomial model, instead appearing like a power-law distribution. The probability is also proportional to So we can use the multinomial model to generate probabilities proportional to a class of power law distribution via a simple transform,

Transforming by Document Frequency • Common words are unlikely to be related to the class of a document, but random variations can create apparent fictitious correlation. • Discount the weight of the common words. Inverse document frequency (a common IR transform)– to discount terms by their document frequency.

Transforming Based on Length • The jump for larger term frequency is disproportionally large with the length of the document. • Discount the influence of long documents by transforming the term frequency:

The New Naive Bayes procedure

The result of experiment comparing MNB to TWCNB and the SVM shows that the TWCNB’s performance is substantially better than MNB, and approach the SVM’s performance.

Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teeva

Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teeva

Presentation Transcript

Non-Bayes classifiers.

Naive Bayes Classifier

Bayes classifiers

Naive Bayes Classifiers, an Overview

Homework 3: Naive Bayes Classification

By: Lawrence Jason Gagtan

Bayes Net Classifiers The Naïve Bayes Model

Naive Bayes

Naive Bayes Collaborative Filtering

Some Effective Techniques for Naive Bayes Text Classification

Naive Bayes model

Recap: Naïve Bayes classifiers

Naïve Bayes Classifiers

Text Classification – Naive Bayes

Lecture 10: Text Classification; The Naive Bayes algorithm

Naive Bayes Classifier

Lecture 10: Text Classification; The Naive Bayes algorithm

Naïve-Bayes Classifiers

Naive Bayes Classifier