1 / 59

Naïve Bayes

Naïve Bayes. Advanced Statistical Methods in NLP Ling572 January 19, 2012. Roadmap. Naïve Bayes Multi- variate Bernoulli event model (recap) Multinomial event model Analysis HW#3. Naïve Bayes Models in Detail. (McCallum & Nigam, 1998)

xuxa
Download Presentation

Naïve Bayes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Naïve Bayes Advanced Statistical Methods in NLP Ling572 January 19, 2012

  2. Roadmap • Naïve Bayes • Multi-variate Bernoulli event model (recap) • Multinomial event model • Analysis • HW#3

  3. Naïve Bayes Models in Detail • (McCallum & Nigam, 1998) • Alternate models for Naïve Bayes Text Classification • Multivariate Bernoulli event model • Binary independence model • Features treated as binary – counts ignored • Multinomial event model • Unigram language model

  4. Multivariate Bernoulli Event Text Model • Each document: • Result of |V| independent Bernoulli trials • I.e. for each word in vocabulary, • does the word appear in the document? • From general Naïve Bayes perspective • Each word corresponds to two variables, wt and • In each doc, either wt or appears • Always have |V| elements in a document

  5. Training & Testing • Laplace smoothed training: • MAP decision rule classification: • P(c)

  6. Multinomial Event Model

  7. Multinomial Distribution • Trial: select a word according to its probability • Possible outcomes: {w1,w2,…,w|V|}

  8. Multinomial Distribution • Trial: select a word according to its probability • Possible outcomes: {w1,w2,…,w|V|} • Document is viewed as result of: • One trial for each position • P(word = wi) = pi • Σipi= 1

  9. Multinomial Distribution • Trial: select a word according to its probability • Possible outcomes: {w1,w2,…,w|V|} • Document is viewed as result of: • One trial for each position • P(word = wi) = pi • Σipi= 1 • P(X1=x1,X2=x2,….,X|V|=x|V|)

  10. Multinomial Distribution • Trial: select a word according to its probability • Possible outcomes: {w1,w2,…,w|V|} • Document is viewed as result of: • One trial for each position • P(word = wi) = pi • Σipi= 1 • P(X1=x1,X2=x2,….,X|V|=x|V|)

  11. Multinomial Distribution • Trial: select a word according to its probability • Possible outcomes: {w1,w2,…,w|V|} • Document is viewed as result of: • One trial for each position • P(word = wi) = pi • Σipi= 1 • P(X1=x1,X2=x2,….,X|V|=x|V|)

  12. Example • Consider a vocabulary V with only three words: • a, b, c Due to F. Xia

  13. Example • Consider a vocabulary V with only three words: • a, b, c • Document di contains only 2 word instances Due to F. Xia

  14. Example • Consider a vocabulary V with only three words: • a, b, c • Document di contains only 2 word instances • For each position: • (P(w=a)=p1, P(w=b)=p2, P(w=c) = p3 Due to F. Xia

  15. Example • Consider a vocabulary V with only three words: • a, b, c • Document di contains only 2 word instances • For each position: • (P(w=a)=p1, P(w=b)=p2, P(w=c) = p3 • What is the probability that we see ‘a’ once and ‘b’ once in di? Due to F. Xia

  16. Example (cont’d) • How many possible sequences? Due to F. Xia

  17. Example (cont’d) • How many possible sequences? 3^2 = 9 • Sequences: aa, ab, ac, bb, ba, bc, ca, cb, cc Due to F. Xia

  18. Example (cont’d) • How many possible sequences? 3^2 = 9 • Sequences: aa, ab, ac, bb, ba, bc, ca, cb, cc • How many sequences with one ‘a’ and one ‘b’? Due to F. Xia

  19. Example (cont’d) • How many possible sequences? 3^2 = 9 • Sequences: aa, ab, ac, bb, ba, bc, ca, cb, cc • How many sequences with one ‘a’ and one ‘b’? • n!/(x1!..x|v|!) = 2 • Probability of the sequence ‘ab’ is: Due to F. Xia

  20. Example (cont’d) • How many possible sequences? 3^2 = 9 • Sequences: aa, ab, ac, bb, ba, bc, ca, cb, cc • How many sequences with one ‘a’ and one ‘b’? • n!/(x1!..x|v|!) = 2 • Probability of the sequence ‘ab’ is: p1*p2 • Probability of the sequence ‘ba’ Due to F. Xia

  21. Example (cont’d) • How many possible sequences? 3^2 = 9 • Sequences: aa, ab, ac, bb, ba, bc, ca, cb, cc • How many sequences with one ‘a’ and one ‘b’? • n!/(x1!..x|v|!) = 2 • Probability of the sequence ‘ab’ is: p1*p2 • Probability of the sequence ‘ba’ : p1 * p2 • So probability of seeing ‘a’ once and ‘b’ once is: Due to F. Xia

  22. Example (cont’d) • How many possible sequences? 3^2 = 9 • Sequences: aa, ab, ac, bb, ba, bc, ca, cb, cc • How many sequences with one ‘a’ and one ‘b’? • n!/(x1!..x|v|!) = 2 • Probability of the sequence ‘ab’ is: p1*p2 • Probability of the sequence ‘ba’ : p1 * p2 • So probability of seeing ‘a’ once and ‘b’ once is: • = 2 p1*p2 Due to F. Xia

  23. Multinomial Event Model • Document is sequence of word events drawn from vocabulary V. • Assume document length independent of class • Assume (Naïve Bayes) words independent of context

  24. Multinomial Event Model • Document is sequence of word events drawn from vocabulary V. • Assume document length independent of class • Assume (Naïve Bayes) words independent of context • Define Nit = # of occurrences of wtin document di

  25. Multinomial Event Model • Document is sequence of word events drawn from vocabulary V. • Assume document length independent of class • Assume (Naïve Bayes) words independent of context • Define Nit = # of occurrences of wtin document di • Then under multinomial event model:

  26. Multinomial Event Model • Document is sequence of word events drawn from vocabulary V. • Assume document length independent of class • Assume (Naïve Bayes) words independent of context • Define Nit = # of occurrences of wtin document di • Then under multinomial event model:

  27. Multinomial Event Model • Document is sequence of word events drawn from vocabulary V. • Assume document length independent of class • Assume (Naïve Bayes) words independent of context • Define Nit = # of occurrences of wtin document di • Then under multinomial event model:

  28. Multinomial Event Model • Document is sequence of word events drawn from vocabulary V. • Assume document length independent of class • Assume (Naïve Bayes) words independent of context • Define Nit = # of occurrences of wtin document di • Then under multinomial event model:

  29. Training • P(cj|di)=1 if document di is of class cj, and 0 o.w. • So,

  30. Training • P(cj|di)=1 if document di is of class cj, and 0 o.w. • So,

  31. Training • P(cj|di)=1 if document di is of class cj, and 0 o.w. • So,

  32. Training • P(cj|di)=1 if document di is of class cj, and 0 o.w. • So,

  33. Training • P(cj|di)=1 if document di is of class cj, and 0 o.w. • So, • Contrast this with multivariate Bernoulli

  34. Testing • To classify a document di compute: • argmaxc P(c)P(di|c)

  35. Testing • To classify a document di compute: • argmaxc P(c)P(di|c) • argmaxc P(c)

  36. Two Naïve Bayes Models • Multi-variate Bernoulli event model: • Models binary presence/absence of word feature

  37. Two Naïve Bayes Models • Multi-variate Bernoulli event model: • Models binary presence/absence of word feature • Multinomial event model: • Models counts of word features, unigram models

  38. Two Naïve Bayes Models • Multi-variate Bernoulli event model: • Models binary presence/absence of word feature • Multinomial event model: • Models counts of word features, unigram models • In experiments on a range of different text classification corpora, multinomial model usually outperforms multivariate Bernoulli (McCallum & Nigam, 1998)

  39. Thinking about Performance • Naïve Bayes: conditional independence assumption • Clearly unrealistic, but performance is often good • Why?

  40. Thinking about Performance • Naïve Bayes: conditional independence assumption • Clearly unrealistic, but performance is often good • Why? • Classification based on sign, not magnitude • Direction of classification usually right • Multivariate Bernoulli vs Multinomial • Why does multinomial perform better?

  41. Thinking about Performance • Naïve Bayes: conditional independence assumption • Clearly unrealistic, but performance is often good • Why? • Classification based on sign, not magnitude • Direction of classification usually right • Multivariate Bernoulli vs Multinomial • Why does multinomial perform better? • Captures additional information: presence/absence+freq • What if we wanted to include other types of features?

  42. Thinking about Performance • Naïve Bayes: conditional independence assumption • Clearly unrealistic, but performance is often good • Why? • Classification based on sign, not magnitude • Direction of classification usually right • Multivariate Bernoulli vs Multinomial • Why does multinomial perform better? • Captures additional information: presence/absence+freq • What if we wanted to include other types of features? • Multivariate: just another Bernoulli trial • Multinomial can’t mix distributions

  43. Model Comparison

  44. Model Comparison

  45. Model Comparison

  46. Model Comparison

  47. Model Comparison

  48. Model Comparison

  49. Model Comparison

  50. Model Comparison

More Related