1 / 17

CS 679 : Text Mining

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 679 : Text Mining. Lecture #5: Conjugate Priors. Slides by Eric Ringger. Announcements. Reading Report #3: Meila & Heckerman on EM Still discussing fundamental ideas Potential Publications

lora
Download Presentation

CS 679 : Text Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 679: Text Mining Lecture #5: Conjugate Priors Slides by Eric Ringger

  2. Announcements • Reading Report #3: • Meila & Heckerman on EM • Still discussing fundamental ideas • Potential Publications • Due today • In preparation for pre-proposal • This week: rank and select with me • Reading Report #4: • Russell & Norvig 14.5 • Due: Wednesday

  3. Objectives • Introduce the Beta and Dirichlet distributions • Explain the idea of a “conjugate prior”

  4. A: Probability Simplex Q: What space do the parameters of a categorical or multinomial live in? 1-D / 2 parameters (Bernoulli / binomial): 2-D / 3 parameters: 3-D / 4 parameters:

  5. Beta Distribution • Parameters ,  determine form of the density • On the 1-D simplex • For these examples, we have a symmetric Beta with  =  f( ) f( ) f( )

  6. Preliminaries: Functions Credit: Wikipedia

  7. Preliminaries: Functions Credit: Wikipedia

  8. Beta Distribution Revisited • Parameters ,  determine form of the density • On the 1-D simplex • For these examples, we have a symmetric Beta with  =  f( ) f( ) f( )

  9. Dirichlet Distribution α=(6, 2, 2) α=(3, 7, 5) Probability density of the Dirichlet distribution when K=3 for various parameter vectors α α=(6, 2, 6) α=(2, 3, 4) Credit: Wikipedia

  10. Symmetric Dirichlet How the log of the density function changes when K=3 as we change the vector α from α=(0.3, 0.3, 0.3) to (2.0, 2.0, 2.0), keeping all the individual αi's equal to one another. Credit: Wikipedia

  11. Dirichlet Distribution • Density function • Dirichlet distribution of order K: • Where: • xi >=0 • i >=0 • Denominator: Credit: Wikipedia

  12. Beta-Binomial Conjugacy 1. Assume a binomial model of our data : 2. Given data , use Bayes law to update the model : 3. Employ a prior distribution over with parameters : 4. New version of (2):

  13. Beta-Binomial Conjugacy

  14. Dirichlet-Multinomial Conjugacy • Generalization of Beta-Binomial Conjugacy

  15. Mixture of Multinomials Modelin one Slide Mixture model:  ci xi,j V N

  16. Clustering Methods • Algorithms compared by Meila & Heckerman: • Probabilistic HAC • Like Ward’s method • EM = Expectation Maximization • CEM = Classification EM = “Hard EM” • Winner-take-all E step • Analogous to prob. k-means (using MM instead of MG)

  17. Next • EM!

More Related