410 likes | 498 Views
Computational methods for inferring cellular networks II. Stat 877 Apr 17 th , 2014 Sushmita Roy. RECAP from last time. A regulatory network has structure and parameters Network reconstruction Identify structure and parameters from data Classes of methods for network reconstruction
E N D
Computational methods for inferring cellular networks II Stat 877 Apr 17th, 2014 Sushmita Roy
RECAP from last time • A regulatory network has structure and parameters • Network reconstruction • Identify structure and parameters from data • Classes of methods for network reconstruction • Per-gene vs Per-module • Sparse candidates is an example of per-gene • Key idea: restrict the parent set to a skeleton defined by “good” candidates • Good candidates: high mutual information OR high predictive power
Goals for today • Per-module methods • Module network • Incorporating priors in graph structure learning • Combining per-gene and per-module methods • Assessing confidence in networks
Module Networks • Motivation: • Most complex systems have too many variables • Not enough data to robustly learn dependencies among them • Large networks are hard to interpret • Key idea: Group similarly behaving variables into “modules” and learn parameters for each module • Relevance to gene regulatory networks • Genes that are co-expressed are likely regulated in similar ways Segal et al 2005
An expression module Set of genes that behave similarly across conditions Genes Genes Modules Genes Gasch & Eisen, 2002
Modeling questions in Module Networks • What is the mathematical definition of a module? • All variables in a module have the same conditional probability distributions • How to model the CPD between parent and children? • Regression Tree • How to learn module networks?
Defining a Module Network • Denoted by • : Structure, specifying the parents of each module • : Assignment of Xi to module k, • : Parameterizing CPD P(Mj|PaMj), PaMj are parents of moduleMj • Each Variable Xi in Mj has the same conditional distribution
Bayesian network vs Module network Each variable takes three values: UP, DOWN, SAME
Bayesian network vs Module network • Bayesian network • CPD per random variable • Learning only requires to search for parents • Module network • CPD per module • Learning requires parent search and module membership assignment
Learning a Module Network • Given • training dataset D={x1,..,xN}, • number of modules • Learn • Module assignment of each Xi to a module • CPDsΘ • The parents of each module
Score of a Module network Module network Likelihood of module j Data K: number of modules, Xj: jth module PaMjParents of module Mj
Module initialization as clustering of variables for module network
Module re-assignment • Two requirements • Must preserve the acyclic structure • Must improve score • Perform sequential update: • The delta score of moving a variable from one module to another while keeping the other variables fixed
Regression tree to capture CPD Each path captures a mode of regulation of X3 by X1 and X2 X1 > e1 NO X2 YES X1 X2 > e2 X3 YES NO Expression of target modeled using Gaussians at each leaf node
Assessing the value of using Module Networks • Generate data, D from a known module network, Mtrue • Mtruewas in turn learned from real data • 10 modules, 500 variables • Learn a module network, M from D • Assess M ’s quality using: • Test data likelihood (higher is better) • Agreement in parent-child relationships between M and Mtrue
Test data likelihood Each line type represents size of training data
Module networks has better performance than simple Bayesian network Gain in test data likelihood over Bayesian network
Application of Module networks to yeast expression data Segal, Regev, Pe’er, Gasch, Nature Genetics 2005
The Respiration and Carbon Module Regulation tree
Global View of Modules • modules for common processes often share common • regulators • binding site motifs
Goals for today • Per-module methods • Module network • Incorporating priors in graph structure learning • Combining per-gene and per-module methods • Assessing confidence in networks
Per-gene vs per-module • Per-gene methods • Precise regulatory programs per gene • No modular organization revealed/captured • Per-module methods • Modular organization-> simpler representation • Gene-specific regulatory information is lost
Can we combine the strengths of both approaches? Module X1 X2 X1 X2 Y1 Y2 X3 X4 X1 X2 Per module Per gene X4 Y2 Y1 Y1 Y2 MERLIN: Per gene module-constrained
Bayesian formulation of network inference • is an unknown random variable • Optimize posterior distribution of graph given data Graph prior Data
A prior to combine per-gene and per-module methods • Let distribute independently over edges • Define prior probability of edge presence Present edges Absent edges Graph structure complexity Module support for an edge Module Prior strength
Behavior of graph structure prior Probability of edge
Quantifying module support • For each candidate Xj for Xi’s regulator set
MERLIN: Learning upstream regulators of regulatory modules Candidate regulators Transcription factors Signaling proteins Update regulators using new modules MCK1 HOG1.. ATF1 RAP1 .. Module EXPRESSION CLUSTERING Targets Final reconstructed network Measurements from multiple conditions Initial modules Revisit modules using expression & regulatory programs Roy et al, Plos Comp bio, 2013
MERLIN correctly infers edges between true and inferred networks on simulated data ? True network Inferred network GENIE3 Precision= MODNET LINEAR-REGRESSION # of correct edges MERLIN # of predicted edges Recall= Precision # of correct edges # of true edges Recall
Goals for today • Per-module methods • Module network • Incorporating priors in graph structure learning • Combining per-gene and per-module methods • Assessing confidence in networks
Assessing confidence in the learned network • Typically the number of training samples is not sufficient to reliably determine the “right” network • One can however estimate the confidence of specific features of the network • Graph features f(G) • Examples of f(G) • An edge between two random variables • Order relations: Is X, Y’s ancestor?
How to assess confidence in graph features? • What we want is P(f(G)|D), which is • But it is not feasible to compute this sum • Instead we will use a “bootstrap” procedure
Bootstrap to assess graph feature confidence • Fori=1 to m • Construct dataset Di by sampling with replacement N samples from dataset D, where N is the size of the originalD • Learn a network Bi • For each feature of interest f, calculate confidence
randomize each row independently Does the bootstrap confidence represent real relationships? • Compare the confidence distribution to that obtained from randomized data • Shuffle the columns of each row (gene) separately. • Repeat the bootstrap procedure Experimental conditions genes
Bootstrap-based confidence differs between real and actual data f Real Random f
Example of a high confidence sub-network One learned Bayesian network Bootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating
Summary • Biological systems are complex with many components • Learning networks from global expression data is challenging • We have seen three strategies to learn these networks • Sparse candidate • Module networks • Strategies to assess network structure confidence
Other problems in regulatory network inference • Combining different types of datasets to improve network structure • E.g. Motif and ChIP binding • Modeling dynamics in networks • Incorporate perturbations on regulatory nodes • Integrating upstream signaling networks with transcriptional networks • Learning context-specific networks • Differential wiring