1 / 41

Computational methods for inferring cellular networks II

Computational methods for inferring cellular networks II. Stat 877 Apr 17 th , 2014 Sushmita Roy. RECAP from last time. A regulatory network has structure and parameters Network reconstruction Identify structure and parameters from data Classes of methods for network reconstruction

linh
Download Presentation

Computational methods for inferring cellular networks II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational methods for inferring cellular networks II Stat 877 Apr 17th, 2014 Sushmita Roy

  2. RECAP from last time • A regulatory network has structure and parameters • Network reconstruction • Identify structure and parameters from data • Classes of methods for network reconstruction • Per-gene vs Per-module • Sparse candidates is an example of per-gene • Key idea: restrict the parent set to a skeleton defined by “good” candidates • Good candidates: high mutual information OR high predictive power

  3. Goals for today • Per-module methods • Module network • Incorporating priors in graph structure learning • Combining per-gene and per-module methods • Assessing confidence in networks

  4. Module Networks • Motivation: • Most complex systems have too many variables • Not enough data to robustly learn dependencies among them • Large networks are hard to interpret • Key idea: Group similarly behaving variables into “modules” and learn parameters for each module • Relevance to gene regulatory networks • Genes that are co-expressed are likely regulated in similar ways Segal et al 2005

  5. An expression module Set of genes that behave similarly across conditions Genes Genes Modules Genes Gasch & Eisen, 2002

  6. Modeling questions in Module Networks • What is the mathematical definition of a module? • All variables in a module have the same conditional probability distributions • How to model the CPD between parent and children? • Regression Tree • How to learn module networks?

  7. Defining a Module Network • Denoted by • : Structure, specifying the parents of each module • : Assignment of Xi to module k, • : Parameterizing CPD P(Mj|PaMj), PaMj are parents of moduleMj • Each Variable Xi in Mj has the same conditional distribution

  8. Bayesian network vs Module network Each variable takes three values: UP, DOWN, SAME

  9. Bayesian network vs Module network • Bayesian network • CPD per random variable • Learning only requires to search for parents • Module network • CPD per module • Learning requires parent search and module membership assignment

  10. Learning a Module Network • Given • training dataset D={x1,..,xN}, • number of modules • Learn • Module assignment of each Xi to a module • CPDsΘ • The parents of each module

  11. Score of a Module network Module network Likelihood of module j Data K: number of modules, Xj: jth module PaMjParents of module Mj

  12. Module network learning algorithm

  13. Module initialization as clustering of variables for module network

  14. Module re-assignment • Two requirements • Must preserve the acyclic structure • Must improve score • Perform sequential update: • The delta score of moving a variable from one module to another while keeping the other variables fixed

  15. Module re-assignment via sequential update

  16. Regression tree to capture CPD Each path captures a mode of regulation of X3 by X1 and X2 X1 > e1 NO X2 YES X1 X2 > e2 X3 YES NO Expression of target modeled using Gaussians at each leaf node

  17. Assessing the value of using Module Networks • Generate data, D from a known module network, Mtrue • Mtruewas in turn learned from real data • 10 modules, 500 variables • Learn a module network, M from D • Assess M ’s quality using: • Test data likelihood (higher is better) • Agreement in parent-child relationships between M and Mtrue

  18. Test data likelihood Each line type represents size of training data

  19. Recovery of graph structure

  20. Module networks has better performance than simple Bayesian network Gain in test data likelihood over Bayesian network

  21. Application of Module networks to yeast expression data Segal, Regev, Pe’er, Gasch, Nature Genetics 2005

  22. The Respiration and Carbon Module Regulation tree

  23. Global View of Modules • modules for common processes often share common • regulators • binding site motifs

  24. Goals for today • Per-module methods • Module network • Incorporating priors in graph structure learning • Combining per-gene and per-module methods • Assessing confidence in networks

  25. Per-gene vs per-module • Per-gene methods • Precise regulatory programs per gene • No modular organization revealed/captured • Per-module methods • Modular organization-> simpler representation • Gene-specific regulatory information is lost

  26. Can we combine the strengths of both approaches? Module X1 X2 X1 X2 Y1 Y2 X3 X4 X1 X2 Per module Per gene X4 Y2 Y1 Y1 Y2 MERLIN: Per gene module-constrained

  27. Bayesian formulation of network inference • is an unknown random variable • Optimize posterior distribution of graph given data Graph prior Data

  28. A prior to combine per-gene and per-module methods • Let distribute independently over edges • Define prior probability of edge presence Present edges Absent edges Graph structure complexity Module support for an edge Module Prior strength

  29. Behavior of graph structure prior Probability of edge

  30. Quantifying module support • For each candidate Xj for Xi’s regulator set

  31. MERLIN: Learning upstream regulators of regulatory modules Candidate regulators Transcription factors Signaling proteins Update regulators using new modules MCK1 HOG1.. ATF1 RAP1 .. Module EXPRESSION CLUSTERING Targets Final reconstructed network Measurements from multiple conditions Initial modules Revisit modules using expression & regulatory programs Roy et al, Plos Comp bio, 2013

  32. MERLIN correctly infers edges between true and inferred networks on simulated data ? True network Inferred network GENIE3 Precision= MODNET LINEAR-REGRESSION # of correct edges MERLIN # of predicted edges Recall= Precision # of correct edges # of true edges Recall

  33. Goals for today • Per-module methods • Module network • Incorporating priors in graph structure learning • Combining per-gene and per-module methods • Assessing confidence in networks

  34. Assessing confidence in the learned network • Typically the number of training samples is not sufficient to reliably determine the “right” network • One can however estimate the confidence of specific features of the network • Graph features f(G) • Examples of f(G) • An edge between two random variables • Order relations: Is X, Y’s ancestor?

  35. How to assess confidence in graph features? • What we want is P(f(G)|D), which is • But it is not feasible to compute this sum • Instead we will use a “bootstrap” procedure

  36. Bootstrap to assess graph feature confidence • Fori=1 to m • Construct dataset Di by sampling with replacement N samples from dataset D, where N is the size of the originalD • Learn a network Bi • For each feature of interest f, calculate confidence

  37. randomize each row independently Does the bootstrap confidence represent real relationships? • Compare the confidence distribution to that obtained from randomized data • Shuffle the columns of each row (gene) separately. • Repeat the bootstrap procedure Experimental conditions genes

  38. Bootstrap-based confidence differs between real and actual data f Real Random f

  39. Example of a high confidence sub-network One learned Bayesian network Bootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating

  40. Summary • Biological systems are complex with many components • Learning networks from global expression data is challenging • We have seen three strategies to learn these networks • Sparse candidate • Module networks • Strategies to assess network structure confidence

  41. Other problems in regulatory network inference • Combining different types of datasets to improve network structure • E.g. Motif and ChIP binding • Modeling dynamics in networks • Incorporate perturbations on regulatory nodes • Integrating upstream signaling networks with transcriptional networks • Learning context-specific networks • Differential wiring

More Related