180 likes | 336 Views
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 679 : Text Mining. Lecture #5: Conjugate Priors. Slides by Eric Ringger. Announcements. Reading Report #3: Meila & Heckerman on EM Still discussing fundamental ideas Potential Publications
E N D
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 679: Text Mining Lecture #5: Conjugate Priors Slides by Eric Ringger
Announcements • Reading Report #3: • Meila & Heckerman on EM • Still discussing fundamental ideas • Potential Publications • Due today • In preparation for pre-proposal • This week: rank and select with me • Reading Report #4: • Russell & Norvig 14.5 • Due: Wednesday
Objectives • Introduce the Beta and Dirichlet distributions • Explain the idea of a “conjugate prior”
A: Probability Simplex Q: What space do the parameters of a categorical or multinomial live in? 1-D / 2 parameters (Bernoulli / binomial): 2-D / 3 parameters: 3-D / 4 parameters:
Beta Distribution • Parameters , determine form of the density • On the 1-D simplex • For these examples, we have a symmetric Beta with = f( ) f( ) f( )
Preliminaries: Functions Credit: Wikipedia
Preliminaries: Functions Credit: Wikipedia
Beta Distribution Revisited • Parameters , determine form of the density • On the 1-D simplex • For these examples, we have a symmetric Beta with = f( ) f( ) f( )
Dirichlet Distribution α=(6, 2, 2) α=(3, 7, 5) Probability density of the Dirichlet distribution when K=3 for various parameter vectors α α=(6, 2, 6) α=(2, 3, 4) Credit: Wikipedia
Symmetric Dirichlet How the log of the density function changes when K=3 as we change the vector α from α=(0.3, 0.3, 0.3) to (2.0, 2.0, 2.0), keeping all the individual αi's equal to one another. Credit: Wikipedia
Dirichlet Distribution • Density function • Dirichlet distribution of order K: • Where: • xi >=0 • i >=0 • Denominator: Credit: Wikipedia
Beta-Binomial Conjugacy 1. Assume a binomial model of our data : 2. Given data , use Bayes law to update the model : 3. Employ a prior distribution over with parameters : 4. New version of (2):
Dirichlet-Multinomial Conjugacy • Generalization of Beta-Binomial Conjugacy
Mixture of Multinomials Modelin one Slide Mixture model: ci xi,j V N
Clustering Methods • Algorithms compared by Meila & Heckerman: • Probabilistic HAC • Like Ward’s method • EM = Expectation Maximization • CEM = Classification EM = “Hard EM” • Winner-take-all E step • Analogous to prob. k-means (using MM instead of MG)
Next • EM!