Computational methods for inferring cellular networks II

Computational methods for inferring cellular networks II Stat 877 Apr 17th, 2014 Sushmita Roy

RECAP from last time • A regulatory network has structure and parameters • Network reconstruction • Identify structure and parameters from data • Classes of methods for network reconstruction • Per-gene vs Per-module • Sparse candidates is an example of per-gene • Key idea: restrict the parent set to a skeleton defined by “good” candidates • Good candidates: high mutual information OR high predictive power

Goals for today • Per-module methods • Module network • Incorporating priors in graph structure learning • Combining per-gene and per-module methods • Assessing confidence in networks

Module Networks • Motivation: • Most complex systems have too many variables • Not enough data to robustly learn dependencies among them • Large networks are hard to interpret • Key idea: Group similarly behaving variables into “modules” and learn parameters for each module • Relevance to gene regulatory networks • Genes that are co-expressed are likely regulated in similar ways Segal et al 2005

An expression module Set of genes that behave similarly across conditions Genes Genes Modules Genes Gasch & Eisen, 2002

Modeling questions in Module Networks • What is the mathematical definition of a module? • All variables in a module have the same conditional probability distributions • How to model the CPD between parent and children? • Regression Tree • How to learn module networks?

Defining a Module Network • Denoted by • : Structure, specifying the parents of each module • : Assignment of Xi to module k, • : Parameterizing CPD P(Mj|PaMj), PaMj are parents of moduleMj • Each Variable Xi in Mj has the same conditional distribution

Bayesian network vs Module network Each variable takes three values: UP, DOWN, SAME

Bayesian network vs Module network • Bayesian network • CPD per random variable • Learning only requires to search for parents • Module network • CPD per module • Learning requires parent search and module membership assignment

Learning a Module Network • Given • training dataset D={x1,..,xN}, • number of modules • Learn • Module assignment of each Xi to a module • CPDsΘ • The parents of each module

Score of a Module network Module network Likelihood of module j Data K: number of modules, Xj: jth module PaMjParents of module Mj

Module network learning algorithm

Module initialization as clustering of variables for module network

Module re-assignment • Two requirements • Must preserve the acyclic structure • Must improve score • Perform sequential update: • The delta score of moving a variable from one module to another while keeping the other variables fixed

Module re-assignment via sequential update

Regression tree to capture CPD Each path captures a mode of regulation of X3 by X1 and X2 X1 > e1 NO X2 YES X1 X2 > e2 X3 YES NO Expression of target modeled using Gaussians at each leaf node

Assessing the value of using Module Networks • Generate data, D from a known module network, Mtrue • Mtruewas in turn learned from real data • 10 modules, 500 variables • Learn a module network, M from D • Assess M ’s quality using: • Test data likelihood (higher is better) • Agreement in parent-child relationships between M and Mtrue

Test data likelihood Each line type represents size of training data

Recovery of graph structure

Module networks has better performance than simple Bayesian network Gain in test data likelihood over Bayesian network

Application of Module networks to yeast expression data Segal, Regev, Pe’er, Gasch, Nature Genetics 2005

The Respiration and Carbon Module Regulation tree

Global View of Modules • modules for common processes often share common • regulators • binding site motifs

Per-gene vs per-module • Per-gene methods • Precise regulatory programs per gene • No modular organization revealed/captured • Per-module methods • Modular organization-> simpler representation • Gene-specific regulatory information is lost

Can we combine the strengths of both approaches? Module X1 X2 X1 X2 Y1 Y2 X3 X4 X1 X2 Per module Per gene X4 Y2 Y1 Y1 Y2 MERLIN: Per gene module-constrained

Bayesian formulation of network inference • is an unknown random variable • Optimize posterior distribution of graph given data Graph prior Data

A prior to combine per-gene and per-module methods • Let distribute independently over edges • Define prior probability of edge presence Present edges Absent edges Graph structure complexity Module support for an edge Module Prior strength

Behavior of graph structure prior Probability of edge

Quantifying module support • For each candidate Xj for Xi’s regulator set

MERLIN: Learning upstream regulators of regulatory modules Candidate regulators Transcription factors Signaling proteins Update regulators using new modules MCK1 HOG1.. ATF1 RAP1 .. Module EXPRESSION CLUSTERING Targets Final reconstructed network Measurements from multiple conditions Initial modules Revisit modules using expression & regulatory programs Roy et al, Plos Comp bio, 2013

MERLIN correctly infers edges between true and inferred networks on simulated data ? True network Inferred network GENIE3 Precision= MODNET LINEAR-REGRESSION # of correct edges MERLIN # of predicted edges Recall= Precision # of correct edges # of true edges Recall

Assessing confidence in the learned network • Typically the number of training samples is not sufficient to reliably determine the “right” network • One can however estimate the confidence of specific features of the network • Graph features f(G) • Examples of f(G) • An edge between two random variables • Order relations: Is X, Y’s ancestor?

How to assess confidence in graph features? • What we want is P(f(G)|D), which is • But it is not feasible to compute this sum • Instead we will use a “bootstrap” procedure

Bootstrap to assess graph feature confidence • Fori=1 to m • Construct dataset Di by sampling with replacement N samples from dataset D, where N is the size of the originalD • Learn a network Bi • For each feature of interest f, calculate confidence

randomize each row independently Does the bootstrap confidence represent real relationships? • Compare the confidence distribution to that obtained from randomized data • Shuffle the columns of each row (gene) separately. • Repeat the bootstrap procedure Experimental conditions genes

Bootstrap-based confidence differs between real and actual data f Real Random f

Example of a high confidence sub-network One learned Bayesian network Bootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating

Summary • Biological systems are complex with many components • Learning networks from global expression data is challenging • We have seen three strategies to learn these networks • Sparse candidate • Module networks • Strategies to assess network structure confidence

Other problems in regulatory network inference • Combining different types of datasets to improve network structure • E.g. Motif and ChIP binding • Modeling dynamics in networks • Incorporate perturbations on regulatory nodes • Integrating upstream signaling networks with transcriptional networks • Learning context-specific networks • Differential wiring

Computational methods for inferring cellular networks II

Computational methods for inferring cellular networks II

Presentation Transcript

Cellular Networks

Computational Spectroscopy II. ab initio Methods

Computational Spectroscopy II. ab initio Methods

Cellular Networks

Cellular Networks

Computational Spectroscopy II. ab initio Methods

Cellular Networks

Cellular Networks

Cellular Networks

Computational methods to inferring cellular networks

Computational methods for inferring cellular networks II

Cellular Networks

Cellular networks

Cellular Networks

Cellular Networks

Cellular Networks

Computational Methods

Cellular Networks II

Cellular Networks

Cellular Networks

Advanced Analysis Methods for 3G Cellular Networks

Cellular Networks