140 likes | 557 Views
Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger . Liang Lan 11/19/2007. Outline. Introduce the Multinomial Naive Bayes Model for Text Classification. The Poor Assumption of Multinomial Naive Bayes Model.
E N D
Tackling the Poor Assumptions of Naive Bayes Text ClassifiersPubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger Liang Lan 11/19/2007
Outline • Introduce the Multinomial Naive Bayes Model for Text Classification. • The Poor Assumption of Multinomial Naive Bayes Model. • Solutions to some problem of the Naive Bayes Classifier.
Multinomial Naive Bayes Model for Text Classification • Given: • A description of the document d: f = (f1,…..,fn) fi is the frequency count of word i occurring in document d • A fixed number of classes: C ={1, 2,…, m}, Parameter Vector for each class The parameter vector for a class c is ci is the probability of word i occurs in class c • Determine: • The class label of d. θ
Introduce the Multinomial Naive Bayes Model for Text Classification • The likelihood of a document is a product of the parameters of the words that appear in the document. • Selecting the class with the largest posterior probability
Parameter Estimation for Naive Bayes Model • The parameters θci must be estimated from the training data. • Then, we get the MNB classifier. • For simplicity, we use uniform prior estimate. • lMNB(d) = argmaxc(fiwci)
The Poor Assumption of Multinomial Naive Bayes Model • Two systemic errors (Occurring in any naive bayes classifier ) 1. Skewed Data Bias ( uneven training size) 2. Weight Magnitude Errors (Caused by the independence assumption) • The Multinomial does not model the text well
Correcting the skewed data bias • More training examples for one class than another --- can cause the classifier to prefer one class over the other. • Using Complement Naive Bayes • Nci is the number of times word i occurred in documents in classes other than c. ~
Correcting the Weight Magnitude Errors • Caused by the independence assumption Ex. “San Francisco” , “Boston” • Normalizing the Weight Vectors We call this Weight-normalized Complement Naive Bayes(WCNB).
Modeling Text Better • Transforming Term Frequency • Transforming by Document Frequency • Transforming Based on Length
Transforming Term Frequency • The term distribution had heavier tails than predicted by the multinomial model, instead appearing like a power-law distribution. The probability is also proportional to So we can use the multinomial model to generate probabilities proportional to a class of power law distribution via a simple transform,
Transforming by Document Frequency • Common words are unlikely to be related to the class of a document, but random variations can create apparent fictitious correlation. • Discount the weight of the common words. Inverse document frequency (a common IR transform)– to discount terms by their document frequency.
Transforming Based on Length • The jump for larger term frequency is disproportionally large with the length of the document. • Discount the influence of long documents by transforming the term frequency:
The result of experiment comparing MNB to TWCNB and the SVM shows that the TWCNB’s performance is substantially better than MNB, and approach the SVM’s performance.