530 likes | 690 Views
Learning Module Networks. Eran Segal Stanford University. Aviv Regev (Harvard) Nir Friedman (Hebrew U.). Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford). Data. MSFT. INTL. NVLS. MOT. Learning Bayesian Networks. Density estimation
E N D
Learning Module Networks Eran Segal Stanford University Aviv Regev (Harvard) Nir Friedman (Hebrew U.) Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford)
Data MSFT INTL NVLS MOT Learning Bayesian Networks • Density estimation • Model data distribution in population • Probabilistic inference: • Prediction • Classification • Dependency structure • Interactions between variables • Causality • Scientific discovery
70 60 MSFT 50 MSFT DELL DELL 40 INTL INTL 30 NVLS MOTI NVLS 20 MOT 10 Mar.’02 May.’02 Aug.’02 Oct.’02 Jan.’03 Jan.’02 Stock Market • Learn dependency of stock prices as a function of • Global influencing factors • Sector influencing factors • Price of other major stocks
70 60 50 MSFT DELL 40 INTL 30 NVLS MOTI 20 10 Mar.’02 May.’02 Aug.’02 Oct.’02 Jan.’03 Jan.’02 Stock Market • Learn dependency of stock prices as a function of • Global influencing factors • Sector influencing factors • Price of other major stocks MSFT DELL INTL NVLS MOT
70 60 50 MSFT DELL 40 INTL 30 NVLS MOTI 20 10 Mar.’02 May.’02 Aug.’02 Oct.’02 Jan.’03 Jan.’02 Stock Market • Learn dependency of stock prices as a function of • Global influencing factors • Sector influencing factors • Price of other major stocks Bayesian Network DELL INTL MSFT NVLS MOT
Fragment of learned BN Stock Market • 4411 stocks (variables) • 273 trading days (instances) from Jan.’02 – Mar.’03 Problems • Statistical robustness • Interpretability
70 60 50 MSFT DELL 40 INTL 30 NVLS MOTI 20 10 Mar.’02 May.’02 Aug.’02 Oct.’02 Jan.’03 Jan.’02 Key Observation • Many stocks depend on the same influencing factors in much the same way • Example: Intel, Novelus, Motorola, Dell depend on the price of Microsoft • Many other domains with similar characteristics • Gene expression • Collaborative filtering • Computer network performance • …
Module Network CPD 1 CPD 1 MSFT Module I CPD 2 CPD 2 CPD 3 MOT CPD 4 DELL INTL Module II CPD 6 CPD 5 CPD 3 AMAT HPQ Module III The Module Network Idea Bayesian Network MSFT MOT DELL INTL AMAT HPQ
Share parameters and dependencies between variables with similar behavior Explicit modeling of modular structure Problems and Solutions • Statistical robustness • Interpretability
Outline • Module Network • Probabilistic model • Learning the model • Experimental results
Module Network Components • Module Assignment Function • A(MSFT)=MI • A(MOT)=A(DELL)=A(INTL) =MII • A(AMAT)= A(HPQ)=MIII AMAT MSFT HPQ DELL MOT INTL MSFT Module I MOT INTL DELL Module II AMAT HPQ Module III
Module Network Components • Module Assignment Function • Set of parents for each module • Pa(MI)= • Pa(MII)={MSFT} • Pa(MIII)={DELL, INTL} MSFT Module I MOT INTL DELL Module II AMAT HPQ Module III
Module Network Components • Module Assignment Function • Set of parents for each module • CPD template for each module MSFT Module I MOT INTL DELL Module II AMAT HPQ Module III
MSFT MOT INTL DELL AMAT HPQ Ground Bayesian Network Ground Bayesian Network • A module network induces a ground BN over X • A module network defines a coherent probabilty distribution over X if the ground BN is acyclic MSFT Module I MOT INTL DELL Module II AMAT HPQ Module III
MI MII MIII Module graph Theorem: The ground BN is acyclic if the module graph is acyclic Module Graph • Nodes correspond to modules • MiMj if at least one variable in Mi is a parent of Mj MSFT Module I MOT INTL DELL Module II AMAT HPQ Acyclicity checked efficiently using the module graph Module III
Outline • Module Network • Probabilistic model • Learning the model • Experimental results
Marginal likelihood Assignment /structure prior Parameter prior Data likelihood Learning Overview • Given data D, find assignment function A and structure S that maximize the Bayesian score • Marginal data likelihood
MI MII|MSFT MIII|DELL,INTL Likelihood Function MSFT Module I MOT INTL DELL Module II AMAT HPQ Likelihood function decomposes by modules Module III Instance 1 Instance 2 Sufficient statistics of (X,Y) Instance 3
Bayesian Score Decomposition • Bayesian score decomposes by modules MSFT Module I Module j parents Module j variables MOT INTL DELL Delete INTL ModuleIII Module II AMAT HPQ Module III
Bayesian Score Decomposition • Bayesian score decomposes by modules MSFT Module I MOT INTL DELL A(MOT)=2 A(MOT)=1 Module II AMAT HPQ Module III
Assignment function A Improve structure Improve assignments Algorithm Overview • Find assignment function A and structure S that maximize the Bayesian score Find initial assignment A Dependency structure S
MSFT DELL AMAT MOT HPQ INTL 1 2 3 Initial Assignment Function Variables (stocks) AMAT MOT DELL MSFT INTL HPQ Instances (trading days) x[1] x[2] x[3] x[4] Find variables that are similar across instances A(MOT)= MII A(INTL)= MII A(DELL)=MII
Assignment function A Improve structure Improve assignments Algorithm Overview • Find assignment function A and structure S that maximize the Bayesian score Find initial assignment A Dependency structure S
Learning Dependency Structure • Heuristic search with operators • Add/delete parent for module • Cannot reverse edges • Handle acyclicity • Can be checked efficientlyon the module graph • Efficient computation • After applying operator formodule Mj, only update scoreof operators for module Mj MSFT ModuleII X MSFT Module I MOT MI MII MIII INTL DELL X Module II INTL ModuleI AMAT HPQ INTL ModuleIII Module III
Learning Dependency Structure • Structure search done at module level • Parent selection • Reduced search space relative to BN • Acyclicity checking • Individual variables only used for computation of sufficient statistics
Assignment function A Improve structure Improve assignments Algorithm Overview • Find assignment function A and structure S that maximize the Bayesian score Find initial assignment A Dependency structure S
Learning Assignment Function • A(DELL)=MI • Score: 0.7 DELL DELL MSFT Module I MOT INTL Module II AMAT HPQ Module III
Learning Assignment Function • A(DELL)=MI • Score: 0.7 • A(DELL)=MII • Score: 0.9 DELL MSFT Module I MOT INTL DELL Module II AMAT HPQ Module III
Learning Assignment Function • A(DELL)=MI • Score: 0.7 • A(DELL)=MII • Score: 0.9 • A(DELL)=MIII • Score: cyclic! MSFT Module I MOT INTL DELL Module II DELL AMAT HPQ Module III
Learning Assignment Function • A(DELL)=MI • Score: 0.7 • A(DELL)=MII • Score: 0.9 • A(DELL)=MIII • Score: cyclic! MSFT Module I MOT INTL DELL Module II AMAT HPQ Module III
Ideal Algorithm • Learn the module assignment of all variables simultaneously
Problem • Due to acyclicity cannot optimize assignment for variables separately A(DELL)=ModuleIV A(MSFT)=ModuleIII DELL DELL DELL MSFT MSFT DELL Module I Module II MI MII DELL AMAT HPQ MIII MIV Module III Module IV Module graph Module Network
Problem • Due to acyclicity cannot optimize assignment for variables separately A(DELL)=ModuleIV A(MSFT)=ModuleIII DELL DELL DELL MSFT MSFT DELL Module I Module II MI MII DELL AMAT HPQ MIII MIV Module III Module IV Module graph Module Network
Learning Assignment Function • Sequential update algorithm • Iterate over all variables • For each variable, find its optimal assignment given the current assignment to all other variables • Efficient computation • When changing assignment from Mi to Mj, only need to recompute score for modules i and j
Learning the Model AMAT MSFT HPQ • Initialize module assignment A • Optimize structure S • Optimize module assignment A • For each variable, find its optimalassignment given the currentassignment to all other variables DELL MOT INTL MSFT Module I MOT INTL DELL Module II AMAT HPQ MOT Module III
Learn parameter sharing Shared parameters Shared structure Learn structure X X N/A X X X X X Langseth+al Related Work Bayesian networks Parameter sharing PRMs OOBNs Module Networks
Outline • Module Network • Probabilistic model • Learning the model • Experimental results • Statistical validation • Case study: Gene regulation
Structure change iterations 50 40 Vars changed (% from total) 30 20 10 Algorithm iterations 0 0 5 10 15 20 Learning Algorithm Performance -128 -129 Bayesian score (avg. per gene) -130 Algorithm iterations -131 0 5 10 15 20
-450 500 instances -500 200 instances -550 • Best performance achieved for models with 10 modules -600 Test data likelihood (per instance) 100 instances -650 -700 -750 25 instances 50 instances -800 0 20 40 60 80 100 120 140 160 180 200 Number of modules Generalization to Test Data • Synthetic data: 10 modules, 500 variables
-450 -500 -550 -600 -650 -700 -750 -800 0 20 40 60 80 100 120 140 160 180 200 Generalization to Test Data • Synthetic data: 10 modules, 500 variables 500 instances 200 instances Test data likelihood (per instance) 100 instances • Gain beyond 100 instances is small 25 instances 50 instances Number of modules
90 80 70 60 50 • 74% of 2250 parent-child relationships recovered 40 30 20 10 0 0 20 40 60 80 100 120 140 160 180 200 Structure Recovery Graph • Synthetic data: 10 modules, 500 variables 500 instances 200 instances Recovered structure (% correct) 100 instances 50 instances 25 instances Number of modules
600 550 Test Data Log-Likelihood(gain per instance) 500 450 Bayesian network performance 400 0 0 50 100 150 200 250 300 Number of modules Stock Market • 4411 variables (stocks), 273 instances (trading days) • Comparison to Bayesian networks (cross validation)
Regulatory Networks • Learn structure of regulatory networks: • Which genes are regulated by each regulator
Gene Expression Data Experiments • Measures mRNA level forall genes in one condition • Learn dependency of the expression of genes as a function of expression of regulators Induced Genes Repressed
150 100 50 Bayesian network performance Test Data Log-Likelihood(gain per instance) 0 -50 -100 -150 0 100 200 300 400 500 Number of modules Gene Expression • 2355 variables (genes), 173 instances (arrays) • Comparison to Bayesian networks
Biological Evaluation • Find sets of co-regulated genes (regulatory module) • Find the regulators of each module 46/50 30/50 Segal et al., Nature Genetics, 2003
false true HAP4 true false Ypl230W ? Experimental Design • Hypothesis: Regulator ‘X’ activates process ‘Y’ • Experiment: Knock out ‘X’ and repeat experiment X Segal et al., Nature Genetics, 2003
Ypl230w Kin82 Ppt1 wt (hrs.) wt wt (min.) (min.) 0 3 5 7 9 24 0 2 5 7 9 24 0 7 15 30 60 0 5 15 30 60 0 7 15 30 60 0 5 15 30 60 341 differentially expressed genes 281 602 >16x >4x >4x Differentially Expressed Genes Segal et al., Nature Genetics, 2003
Ypl230w # Module Significance 39 Protein folding 7/23, 1e - 4 29 Cell differentiation 6/41, 2e - 2 5 Glycolysis and foldin g 5/37, 4e - 2 34 Mitochondrial and protein fate 5/37, 4e - 2 Ppt1 # Module Significance 14 Ribosomal and phosphate metabolism 8/32, 9e 3 11 Amino acid and purine metabolism 11/53, 1e 2 15 mRNA, rRNA and tRNA processing 9/43, 2e 2 39 Protein f olding 6/23, 2e 2 30 Cell cycle 7/30, 2e 2 Kin82 • All regulators regulate predicted modules # Module Significance 3 Ener gy and osmotic stress I 8/31, 1e 4 2 Energy, osmolarity & cAMP signaling 9/64, 6e 3 15 mRNA, rRNA and tRNA processing 6/43, 2e 2 Biological Experiments Validation • Were the differentially expressed genes predicted as targets? • Rank modules by enrichment for diff. expressed genes Segal et al., Nature Genetics, 2003
Summary • Probabilistic model for learning modules of variables and their structural dependencies • Improved performance over Bayesian networks • Statistical robustness • Interpretability • Application to gene regulation • Reconstruction of many known regulatory modules • Prediction of targets for unknown regulators