430 likes | 662 Views
Introduction to Molecular Networks. BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Nov 27 th , 2012. Different types of networks. Physical networks Protein-DNA : interactions between regulatory proteins (transcription factors) and regulatory DNA
E N D
Introduction to Molecular Networks BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Nov 27th, 2012
Different types of networks • Physical networks • Protein-DNA: interactions between regulatory proteins (transcription factors) and regulatory DNA • Protein-protein: interactions among proteins • Signaling networks: interactions between protein and small molecules, and among proteinsthat relay signals from outside the cell to the nucleus • Functional networks • metabolic: describe reactions through which enzymes convert substrates to products • genetic: describe interactions among genes which when genetically perturbed together produce a significant phenotype than individually • co-expression: describes the dependency between expression patterns of genes under different conditions
Protein-DNA interactions Transcriptional regulatory networks S. cerevisiae: E. coli 153 TFs (green & light red), 1319 targets 157 TFs and 4410 targets Vargas and Santillan, 2008
Detecting protein-DNA interactions • ChIP-chip • ChIP-seq • Promoter scanning of sequence-specific motifs • DNAseI hypersensitivity maping • Chromatin marks to identify “regulatory regions” followed by scanning using sequence-specific motifs
Protein-DNA interaction example • goal: determine the (approximate) locations in the genome where a given protein binds • ChIP-chip and ChIP-chip binding profiles for transcription factors Peter Park, Nature Reviews Genetics, 2009
Protein-protein interaction networks Yeast Human Node colors: Red: lethal, green: non-lethal, yellow: slow growth Edge colors: Red:Rualet al., blue: literature Barabasi et al. 2003, Rual et al. 2005
Detecting protein-protein interactions • Binary interactions • Yeast two-hybrid:Uses a transcription factors with two domains: each fused to proteins of interest, and a reporter gene • Protein Complementation Assay • Complexes • Tandem Affinity Purification (TAP) with Mass-spectrometry • Makes use of a TAP tag attached to a protein of interest. Protein and complex are pulled and purified in two steps. Yeast two hybrid TAP Protein complementation Shoemaker and Panchenko, 2007, PloS computational biology, Xu et al, Protein Expression and Purification, 2010
Metabolic networks gene products other molecules Figure from KEGG database
Genetic interaction networks Dixon et al., 2009, Annu. Rev. Genet
Yeast genetic interaction network Costanzo et al, 2011
Computational challenges in networks • Identifying the connectivity • Structure and parameter learning • Using the connectivity to infer function and activation • Network-based predictive models • Analyzing the network structure • Graph clustering • Graph properties • Network motifs We will study these questions in the context of transcriptional regulatory networks
Network model representations • Unweighted graphs • Boolean networks • Bayesian networks and related graphical models • Differential equations • Petri nets • Constraint-based models • etc.
Transcriptional gene regulation Input: Transcription factor level (trans) Sko1 Hot1 HSP12 Input: Transcription factor binding sites (cis) Output: mRNA levels Transcriptional regulatory network connects TFs to target genes
Regulatory network inference from expression Expression-based network inference
Modeling a regulatory network Sko1 Hot1 HSP12 X2 X1 Hot1 Sko1 BOOLEAN LINEAR DIFF. EQNS PROBABILISTIC …. Hot1 regulates HSP12 ψ(X1,X2) HSP12 is a target of Hot1 HSP12 Y Function Structure Who are the regulators? How they determine expression levels?
Network inference from expression is a computationally difficult problem • Given 2 TFs and 3 nodes how many possible networks can there be? …. Not exhaustive set of possible networks There can be a total of 26 possible networks.
Why is this problem so hard? • Assume we have n target genes and mTFs. • Number of possible edges: nXm • For example, with 4500 target genes and 300 TFs we have 1.35 million edges! • Number of possible networks is 2nXm Need clever methods to address this large space of possibilities.
Two classes of expression-based methods • Per-gene/direct methods • Module based methods
Per-gene methods • Key idea: find the regulators that “best explain” expression of a gene • Mutual Information • Context Likelihood of relatedness • ARACNE • Probabilistic methods • Bayesian network: Sparse Candidates • Regression • TIGRESS • GENIE-3
Per-gene methods can be further classified based on how regulators are added • Pairwise: • Ask if TF Y and gene X have a high statistical correlation/mutual information • Examples are CLR and ARACNE • Higher-order: • Ask if TFs {Y1,Y2..YK} explain expression of X best • Regression, Bayesian networks, Dependency networks
Pairwise methods • ARACNE • CLR Both need to find a good way to pick a cutoff of what is an edge vs not
Information theory for measuring dependence • I(X,Y) is the mutual information between two variables • Knowing X, how much information do I have for Y • P(Z) is the probability distribution of Z
ARACNE Getting rid of indirect links: Target X2 X1 X3 Regulators X1 I(X1,X2) I(X1,X3) X2 X3 I(X2,X3) Exclude edges with lowest information in a triplet I(X2,X3) < min(I(X1,X2),I(X1,X3)) These typically correspond to low mutual information. Margolin et al 2006
Context Likelihood of Relatedness (CLR) • For a genejand regulator i, context is defined by the mutual information of j with all other regulators, and mutual information of i with all other target genes. • Use the contexts to compute two background distributions of mutual information • Get a z-value for Mij with respect to these distributions. • Final z-value is the square root of these z-values • Call an edge is z-value is greater than a cutoff.
Context Likelihood of Relatedness Mij i j zij is the likelihood of observing Mij from either distribution by chance Use zij to decide if gene i regulates gene j.
Higher order models for network inference • Bayesian networks • Dependency networks Random variables encode expression levels Sho1 Msb2 Regulators X2 X1 X1 Ste20 Y3=f(X1,X2) X2 Y3 Target Y3 Structure Function Goal: learn the structure and function of these networks
Bayesian networks • a BN is a Directed Acyclic Graph (DAG) in which • the nodes denote random variables • each node X has a conditional probability distribution (CPD) representing P(X | Parents(X)) • the intuitive meaning of an arc from X to Y is that X directly influences Y • Provides a tractable way to work with large joint distributions
Bayesian networks for representing regulatory networks … ? ? ? Regulators (parents) Yi Conditional probability distribution (CPD) Target (child)
Example Bayesian network Parents X2 X1 X4 X3 Child Assume Xi is binary X5 Needs 25 measurements No independence assertions Needs 23 measurements Independence assertions
P( D | A, B,C) as a tree A f t Pr(D =t) = 0.9 B f t Pr(D =t) = 0.5 C f t Pr(D =t) = 0.8 Pr(D =t) = 0.5 Representing CPDs for discrete variables • CPDs can be represented using tables or trees • consider the following case with Boolean variables A, B, C, D P( D | A, B,C) as a table
Representing CPDs for continuous variables Parameters X2 X1 X3 Conditional Gaussian
Dependency networks: a set of regression problems Regulators 1 p 1 … 1 ? ? ? 1 Yi X1 …… Xp = bj Yi d p d Function: Linear regression Regularization term Number of genes
Two classes of expression-based methods • Per-gene/direct methods • Module based methods
An expression module Set of genes that are co-expressed in a set of conditions Genes Genes Modules Genes Gasch & Eisen, 2002
Expression modules identified by expression clustering Experiments M1 Cluster M2 Genes M3
Module Networks Revisit the modules Learn regulators per module Y2 Y1 Y2 Y1 X1 X2 X2 X1 X2 M1 X1 X3 X4 X4 X3 X4 X3 M2 X5 Y2 Y1 Y2 Y1 X6 X7 X6 X7 X5 X6 X8 X7 X5 X8 X8 M3 Every gene in a module has the same set of regulatory program Lee et al 2009, Segal et al 03
Modeling the relationship between regulators and targets • suppose we have a set of (8) genes that all have in their upstream regions the same activator/repressor binding sites
Modeling the relationship between regulators and targets X1 > e1 Each path captures a mode of regulation NO YES Activating regulation X2 > e2 Activating regulation YES NO Repressing regulation Expression of target modeled using Gaussians at each leaf node
Global View of Modules • modules for common processes often share common • regulators • binding site motifs