1 / 23

Stochastic Block Models of Mixed Membership

Stochastic Block Models of Mixed Membership. Edo Airoldi 1,2 , Dave Blei 2 , Steve Fienberg 1 , Eric Xing 1 1 Carnegie-Mellon University & 2 Princeton University. SAMSI, High Dimensional Inference and Random Matrices, September 17 th , 2006. Interaction graphs. Expression graphs.

micah-ross
Download Presentation

Stochastic Block Models of Mixed Membership

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Block Models of Mixed Membership Edo Airoldi 1,2, Dave Blei 2, Steve Fienberg 1, Eric Xing 1 1 Carnegie-Mellon University & 2 Princeton University SAMSI, High Dimensional Inference and Random Matrices, September 17th, 2006

  2. Interaction graphs Expression graphs The Scientific Problem • Protein-protein interactions in Yeast • Different studies test protein interactions with different technologies (precision)

  3. The Data: Interaction Graphs • M proteins in a graph (nodes) • M2 observations on pairs of proteins • Edges are random quantities, Y [n,m] • Interactions are not independent • Interacting proteins form a protein complex • T graphs on the same set of proteins • Partial annotations for each protein, X [n] M = 871 nodes M2 = 750K entries

  4. The Scientific Problems • What are stable protein complexes? • They perform many cellular processes • A protein may be a member of several ones • How many are there? • How do stable protein complexes interact? • Test hypotheses (inform new analyses) • Learn complex-to-complex interaction patterns

  5. More Network Data Disease Spread Electronic Circuit Food Web Internet Social Network

  6. An Abstraction of the Data • A collection of unipartite graphs: G1:T = (Y1:T ,N ) • Integer, real, multivariate edge weights:Yt = { Yt [nm] : n,m  N } • Node-specific (multivariate) attributes: X1:T = { Xt [n] : n  N } • Partially observable Y1:T and X1:T

  7. The Challenge • Given the data abstraction and the goals of the analysis • Can we posit a rich class of models that is instrumental for thinking about the scientific problems we face? Amenable to theoretical analyses?

  8. Modeling Ideas • Hierarchical Bayes • Latent variables encode semantic elements • Assume structure on observable-latent elements • Combination of 2 class of models 1. Models of mixed membership 2. Network models (block models)  = Stochastic block models of mixed membership

  9. Graphical Model Representation Stochastic Blocks Mixed Membership

  10. Group-to-group patterns (latent*) Interactions (observed*) Mixed membership Vectors (latent*) 1 2 3 h  i j j g i 23 = 0.9 1 2 3 1 2 3 T Pr ( yij=1 | i,j, ) = i j yij = 1 A Hierarchical Likelihood

  11. More Modeling Issues • Technical :: Sparsity • Introduce parameter that modulates the relative importance of ones and zeros (binary edges) in the cost function that drives the clustering • Biological :: Ribosomes & Distress • Some protein complexes act like hubs because they are involved, e.g., in protein production or cell recovery (Y2H technology is invasive)

  12. Large Scale Computation • Masses of data • 750K observations in a small problem (M=871) • 2.5M observations with (M=1578) • 3M expressions for 6K genes/proteins in Yeast • Variational inference [ Jordan et al., 2001 ] • Naïve implementation does not work • We develop a novel “nested” variational algorithm

  13. Example: A Scientific Question • Do PPI contain information about functions? Model Approximate Posterior on Membership Vectors ? YLD014W Raw data Functional Annotations

  14. 1 0 1 2 3 . . . 15 Interactions in Yeast (MIPS) • Do PPI contain information about functions? YLD014W

  15. Results: Identifiability • In this example we map latent groups to known functional categories Known Annotations Unknown Annotations

  16. Results: Functional Annotations

  17. Mixed membership Results: Mixed Membership • The estimated membership vectors support the mixed membership assumption

  18. Results: Stochastic Block Model

  19. General Bayesian Formulation • Assumptions for unipartite graphs • Population: existence of K sub-populations • Latent variable: mixed memb. vectors [n] ~ D • Subject: exchangeable edges given blocks & memb.Y[nm] ~ f ( . | [n] [m]) • Sampling scheme: the graphs are IID • Additional data, e.g., attributes, annotations • Integrated model formulation (descriptive/predictive) T

  20. Variational Algorithms • Nested algorithm: • init (i i) • while (≈ log-lik )loop ij • init ij • while (≈ log-lik )update ij partially update (i,j) • Naïve algorithm: • init (i i,ij ij) • while (≈ log-lik )update (ijij)update(i i) We trade space for time but …

  21. Variational Algorithms for MMSB Nested Nested Naïve Naïve On a single machine* we empirically observed: faster convergence (offsets extra computation), and more stable paths to convergence.

  22. Take Home Points • Bayesian formulation is integral to the biology • A novel class of models that combines MM for soft-clustering & network models for dependent data • Latent aspects  patterns that correlate with, help predict, functional processes in the cell • Current implementation allows for fast inference on large matrices through variational approximation  considerable opportunity to improve upon both computation and efficiency of the approximation

  23. Data & Problems: Gavin et al. (2002) Nature; Ho et al. (2002) Nature; Mewes et al. (2004) Nucleic Acids Research; Krogan et al. (2006) Nature. • Mixed Membership Models • Pritchard et al. (2000); Erosheva (2002); Rosenberg et al. (2002); Blei et al. (2003); Xing et al. (2003ab); Erosheva et al. (2004); Airoldi et al. (2005); Blei & Lafferty (2006); Xing et al. (2006) • Stochastic network models • Wasserman et al. (1980, 1994, 1996); Fienberg et al. (1985); Frank & Strauss (1986); Nowicki & Snijders (2001); Hoff et al. (2002), Airoldi et al. (2006) • More material on the Web at: http://www.cs.cmu.edu/~eairoldi/ • ICML Workshop on “Statistical Network Analysis: Models, Issues and New Directions” on June 29 at Carnegie Mellon, Pittsburgh PA: http://nlg.cs.cmu.edu/

More Related