1 / 42

Analysis of Social Media MLD 10-802, LTI 11-772

Analysis of Social Media MLD 10-802, LTI 11-772. William Cohen 10-09-010. Stochastic blockmodel graphs. Last week: spectral clustering Theory suggests it will work for graphs produced by a particular generative model

talli
Download Presentation

Analysis of Social Media MLD 10-802, LTI 11-772

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Social MediaMLD 10-802, LTI 11-772 William Cohen 10-09-010

  2. Stochastic blockmodel graphs • Last week: spectral clustering • Theory suggests it will work for graphs produced by a particular generative model • Question: can you directly maximize Pr(structure,parameters|data) for that model?

  3. Outline • Stochastic block models & inference question • Review of text models • Mixture of multinomials & EM • LDA and Gibbs (or variational EM) • Block models and inference • Mixed-membership block models • Multinomial block models and inference w/ Gibbs • Beastiary of other probabilistic graph models • Latent-space models, exchangeable graphs, p1, ERGM

  4. Review – supervised Naïve Bayes • Naïve Bayes Model: Compact representation   C C ….. WN W1 W2 W3 W M N b M b

  5. Review – supervised Naïve Bayes • Multinomial Naïve Bayes  • For each document d = 1,, M • Generate Cd ~ Mult( ¢ | ) • For each position n = 1,, Nd • Generate wn ~ Mult(¢|,Cd) C ….. WN W1 W2 W3 M b

  6. Review – supervised Naïve Bayes • Multinomial naïve Bayes: Learning • Maximize the log-likelihood of observed variables w.r.t. the parameters: • Convex function: global optimum • Solution:

  7. Review – unsupervised Naïve Bayes • Mixture model: unsupervised naïve Bayes model • Joint probability of words and classes: • But classes are not visible:  C Z W N M b

  8. Review – unsupervised Naïve Bayes • Mixture model: learning • Not a convex function • No global optimum solution • Solution: Expectation Maximization • Iterative algorithm • Finds local optimum • Guaranteed to maximize a lower-bound on the log-likelihood of the observed data

  9. Review – unsupervised Naïve Bayes • Mixture model: EM solution E-step: M-step: Key capability: estimate distribution of latent variables given observed variables

  10. Review - LDA

  11. Review - LDA • Motivation • Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) • For each document d = 1,,M • Generate d ~ D1(…) • For each word n = 1,, Nd • generate wn ~ D2( ¢ | θdn) • Now pick your favorite distributions for D1, D2  w N M

  12. “Mixed membership” • Latent Dirichlet Allocation  • For each document d = 1,,M • Generate d ~ Dir(¢ | ) • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn) a z w N M  K

  13. LDA’s view of a document

  14. LDA topics

  15. Review - LDA • Latent Dirichlet Allocation • Parameter learning: • Variational EM • Numerical approximation using lower-bounds • Results in biased solutions • Convergence has numerical guarantees • Gibbs Sampling • Stochastic simulation • unbiased solutions • Stochastic convergence

  16. Review - LDA • Gibbs sampling • Applicable when joint distribution is hard to evaluate but conditional distribution is known • Sequence of samples comprises a Markov Chain • Stationary distribution of the chain is the joint distribution Key capability: estimate distribution of one latent variables given the other latent variables and observed variables.

  17. Why does Gibbs sampling work? • What’s the fixed point? • Stationary distribution of the chain is the joint distribution • When will it converge (in the limit)? • Graph defined by the chain is connected • How long will it take to converge? • Depends on second eigenvector of that graph

  18. Called “collapsed Gibbs sampling” since you’ve marginalized away some variables Fr: Parameter estimation for text analysis - Gregor Heinrich

  19. Review - LDA “Mixed membership” • Latent Dirichlet Allocation  • Randomly initialize each zm,n • Repeat for t=1,…. • For each doc m, word n • Find Pr(zmn=k|other z’s) • Sample zmn according to that distr. a z w N M 

  20. Outline • Stochastic block models & inference question • Review of text models • Mixture of multinomials & EM • LDA and Gibbs (or variational EM) • Block models and inference • Mixed-membership block models • Multinomial block models and inference w/ Gibbs • Beastiary of other probabilistic graph models • Latent-space models, exchangeable graphs, p1, ERGM

  21. Statistical Models of Networks • Want a generative probabilistic model that’s amenable to analysis…. • … but more expressive than Erdos-Renyi • One approach: exchangeable graph model • Exchangeable: X1,X2 are exchangable if Pr(X1,X2,W)=Pr(X2,X1,W). • The generalizes of i.i.d.-ness • It’s a Bayesian thing

  22. Review - LDA • Motivation • Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) • For each document d = 1,,M • Generate d ~ D1(…) • For each word n = 1,, Nd • generate wn ~ D2( ¢ | θdn) • Docs and words are exchangeable.  w N M

  23. Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a b zp zp zq p apq N N2

  24. Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a • Gibbs sampling: • Randomly initialize zp for each node p. • For t = 1… • For each node p • Compute zpgiven other z’s • Sample zp b zp zp zq p apq N N2 See: Snijders & Nowicki, 1997, Estimation and Prediction for Stochastic Blockmodels for Groups with Latent Graph Structure

  25. Mixed Membership Stochastic Block models a b p q p zp. z.q p apq N N2 Airoldi et al, JMLR 2008

  26. Mixed Membership Stochastic Block models

  27. Mixed Membership Stochastic Block models

  28. Parkkinen et al paper

  29. Another mixed membership block model

  30. Another mixed membership block model z=(zi,zj) is a pair of block ids nz = #pairs z qz1,i = #links to i from block z1 qz1,. = #outlinks in block z1 δ = indicator for diagonal M = #nodes

  31. Another mixed membership block model

  32. Another mixed membership block model

  33. Outline • Stochastic block models & inference question • Review of text models • Mixture of multinomials & EM • LDA and Gibbs (or variational EM) • Block models and inference • Mixed-membership block models • Multinomial block models and inference w/ Gibbs • Beastiary of other probabilistic graph models • Latent-space models, exchangeable graphs, p1, ERGM

  34. Exchangeable Graph Model • Defined by a 2k x 2k table q(b1,b2) • Draw a length-k bit string b(n) like 01101 for each node n from a uniform distribution. • For each pair of node n,m • Flip a coin with bias q(b(n),b(m)) • If it’s heads connect n,m complicated • Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so ui’s are correlated. • Pass each uithru a sigmoid so it’s in [0,1] – call that pi • Pick biusing pi

  35. Exchangeable Graph Model 1 If α is big then ux,uy are really big (or small) so px,py will end up in a corner. • Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so ui’s are correlated. • Pass each ui thru a sigmoid so it’s in [0,1] – call that pi • Pick biusing pi 0 1

  36. The p1 model for a directed graph • Parameters, per node i: • Θ: background edge probability • αi: “expansiveness” – how extroverted is i? • βi:“popularity”– how much do others want to be with i? • ρi: “reciprocation” – how likely is i torespond to an incomping link with an outgoing one? Logistic-regression like procedure can be used to fit this to data from a graph

  37. Exponential Random Graph Model • Basic idea: • Define some features of the graph (e.g., number of edges, number of triangles, …) • Build a MaxEnt-style model based on these features

  38. Latent Space Model • Each node i has a latent position in Euclidean space, z(i) • z(i)’s drawn from a mixture of Gaussians • Probability of interaction between i and j depend on the distance between z(i) and z(j) • Inference is a little more complicated… [Handcock & Raftery, 2007]

  39. Outline • Stochastic block models & inference question • Review of text models • Mixture of multinomials & EM • LDA and Gibbs (or variational EM) • Block models and inference • Mixed-membership block models • Multinomial block models and inference w/ Gibbs • Beastiary of other probabilistic graph models • Latent-space models, exchangeable graphs, p1, ERGM

More Related