1 / 14

Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teeva

Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger . Liang Lan 11/19/2007. Outline. Introduce the Multinomial Naive Bayes Model for Text Classification. The Poor Assumption of Multinomial Naive Bayes Model.

giselle
Download Presentation

Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teeva

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tackling the Poor Assumptions of Naive Bayes Text ClassifiersPubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger Liang Lan 11/19/2007

  2. Outline • Introduce the Multinomial Naive Bayes Model for Text Classification. • The Poor Assumption of Multinomial Naive Bayes Model. • Solutions to some problem of the Naive Bayes Classifier.

  3. Multinomial Naive Bayes Model for Text Classification • Given: • A description of the document d: f = (f1,…..,fn) fi is the frequency count of word i occurring in document d • A fixed number of classes: C ={1, 2,…, m}, Parameter Vector for each class The parameter vector for a class c is ci is the probability of word i occurs in class c • Determine: • The class label of d. θ

  4. Introduce the Multinomial Naive Bayes Model for Text Classification • The likelihood of a document is a product of the parameters of the words that appear in the document. • Selecting the class with the largest posterior probability

  5. Parameter Estimation for Naive Bayes Model • The parameters θci must be estimated from the training data. • Then, we get the MNB classifier. • For simplicity, we use uniform prior estimate. • lMNB(d) = argmaxc(fiwci)

  6. The Poor Assumption of Multinomial Naive Bayes Model • Two systemic errors (Occurring in any naive bayes classifier ) 1. Skewed Data Bias ( uneven training size) 2. Weight Magnitude Errors (Caused by the independence assumption) • The Multinomial does not model the text well

  7. Correcting the skewed data bias • More training examples for one class than another --- can cause the classifier to prefer one class over the other. • Using Complement Naive Bayes • Nci is the number of times word i occurred in documents in classes other than c. ~

  8. Correcting the Weight Magnitude Errors • Caused by the independence assumption Ex. “San Francisco” , “Boston” • Normalizing the Weight Vectors We call this Weight-normalized Complement Naive Bayes(WCNB).

  9. Modeling Text Better • Transforming Term Frequency • Transforming by Document Frequency • Transforming Based on Length

  10. Transforming Term Frequency • The term distribution had heavier tails than predicted by the multinomial model, instead appearing like a power-law distribution. The probability is also proportional to So we can use the multinomial model to generate probabilities proportional to a class of power law distribution via a simple transform,

  11. Transforming by Document Frequency • Common words are unlikely to be related to the class of a document, but random variations can create apparent fictitious correlation. • Discount the weight of the common words. Inverse document frequency (a common IR transform)– to discount terms by their document frequency.

  12. Transforming Based on Length • The jump for larger term frequency is disproportionally large with the length of the document. • Discount the influence of long documents by transforming the term frequency:

  13. The New Naive Bayes procedure

  14. The result of experiment comparing MNB to TWCNB and the SVM shows that the TWCNB’s performance is substantially better than MNB, and approach the SVM’s performance.

More Related