Introduction to Networks

Introduction to Networks Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576 sroy@biostat.wisc.edu Nov 14th, 2013

Key concepts in networks • What are molecular networks? • Different types of networks • Algorithms for network reconstruction • Classes of methods for network inference • Representing networks as probabilistic graphical models • Bayesian network learning • Heuristics to improve network learning time • Algorithms for network-based analyses (subsequent lectures) • Dense subgraph-based interpretation of gene sets • Predicting function of nodes

A network • Describes connectivity patterns between the components of a system • Nodes: components • Edges: connections • Edges can have weights • Topology of a networks is represented as a graphs • Node and vertex are used interchangeably • Edge and interaction are used interchageably

Why are networks important? • Most complex systems have a natural representation as a network • Examples of networks • Social networks • Internet • Molecular networks • This is what we will focus on

Different types of networks • Depends on what • the nodes represent • the edges represent • whether edges directed or undirected • Molecular networks • Nodes are bio-molecules • Genes, proteins, metabolites • Edges represent interaction between molecules

Example directed and undirected networks Vertex/Node A A E E D F D F Edge Directed Edge B B C C Undirected network Directed network

Different types of molecular networks • Physical networks • Transcriptional regulatory networks: interactions between regulatory proteins (transcription factors) and genes • Protein-protein: interactions among proteins • Signaling networks: interactions between protein and small molecules, and among proteins that relay signals from outside the cell to the nucleus • Functional networks • metabolic: describe reactions through which enzymes convert substrates to products • genetic: describe interactions among genes which when genetically perturbed together produce a significant phenotype than individually • co-expression: describes the dependency between expression patterns of genes under different conditions

Transcriptional regulatory networks Nodes: regulatory protein like a TF, or target gene Edges: TF A regulates B E. coli S. cerevisiae 153 TFs (green & light red), 1319 targets 157 TFs and 4410 targets Vargas and Santillan, 2008

Protein-protein interaction networks Nodes: proteins, Edges: Protein A physically interacts with Protein B Yeast Human Barabasi et al. 2003, Rual et al. 2005

Metabolic networks gene products other molecules Figure from KEGG database

Genetic interaction networks Dixon et al., 2009, Annu. Rev. Genet

Problems in networks • Network reconstruction • Inference of edges between the nodes • Network structure analysis • Properties of networks • Networks as tools for analysis • Interpretation of gene sets • Using networks to infer function of a gene

Computational Network reconstruction • Given • A set of attributes associated with network nodes • Typically attributes are mRNA levels • Do • Infer what nodes interact with each other • Algorithms for network reconstruction can vary based on their meaning of interaction • Similarity • Mutual information • Predictive ability

Computational methods to infer networks • We will focus on transcriptional regulatory networks • These networks are inferred from gene expression data • Many methods to do network inference • We will focus on probabilistic graphical models

Modeling a regulatory network Sko1 Hot1 HSP12 X2 X1 Hot1 Sko1 BOOLEAN LINEAR DIFF. EQNS PROBABILISTIC …. Hot1 regulates HSP12 ψ(X1,X2) HSP12 is a target of Hot1 HSP12 Y Function Structure Who are the regulators? How they determine expression levels?

Mathematical representations of networks X1 X2 Input expression of neighbors Models differ in the function that maps input system state to output state f X3 Output expression of node Boolean Networks Differential equations Probabilistic graphical models Probability distributions Rate equations Input Output X1 X2 X3

Regulatory network inference from expression Experiments Genes Expression-based network inference

Two classes of expression-based methods • Per-gene/direct methods • Module based methods

Per-gene methods • Key idea: find the regulators that “best explain” expression of a gene • Mutual Information • Context Likelihood of relatedness • ARACNE • Probabilistic methods • Bayesian network: Sparse Candidates • Regression • TIGRESS • GENIE-3

Per-gene methods can be further classified based on how regulators are added • Pairwise: • Ask if TF Y and gene X have a high statistical correlation/mutual information • Examples are CLR and ARACNE • Higher-order: • Ask if TFs {Y1,Y2..YK} explain expression of X best • Bayesian networks, Dependency networks

Higher order models for network inference • Based on a general class of models called probabilistic graphical models • Have a graph component with nodes representing random variables • Have a probabilistic component • Represent the joint distribution of the random variables corresponding to the nodes • Examples • Bayesian networks • Dependency networks

Bayesian networks (BN) • A BN compactly represents a joint probability distribution • It has two parts: • A graph which is directed and acyclic • A set of conditional distributions • Directed Acyclic Graph (DAG) • The nodes denote random variables X1… XN • The edges • encode statistical dependencies between the random variables • Establish parent child relationships • Each node Xi has a conditional probability distribution (CPD) representing P(Xi| Parents(Xi)) • Provides a tractable way to work with large joint distributions • The joint is written as a product of “local” conditional distributions, one per Xi

Bayesian network representation of a regulatory network Random variables encode expression levels Regulators (Parents) X2 X1 X1 Sho1 X2 Msb2 P(X3|X1,X2) X3 Target (child) X3 Ste20 Parameters of CPD for child given parents. Structure Genes Random variables

Bayesian networks compactly represent joint distributions CPD

Example Bayesian network of 5 variables Parents X2 X1 X4 X3 Child X5 Assume Xi is binary Needs 25 measurements No independence assertions Needs 23 measurements Independence assertions

CPD in Bayesian networks • The same structure can be parameterized in different ways • For example for discrete variables we can have table or tree representations

Representing CPDs as tables • Consider the following case with Boolean variablesX1, X2, X3, X4 P( X4|X1, X2,X3 ) as a table X4 Parents of X4 X2 X1 X3 X4

Estimating CPD table from data • Consider the four RVs from the previous slide • Assume we observe the following data X1 X2 X3 X4 For each joint assignment to X1, X2, X3, estimate the probabilities for each possible value of X4 For example, consider X1=T, X2=F, X3=T P(X4=T|X1=T, X2=F, X3=T)=2/4 P(X4=F|X1=T, X2=F, X3=T)=2/4

P( X4|X1, X2,X3 ) as a tree X1 f t Pr(X4=t) = 0.9 X2 f t Pr(X4=t) = 0.5 X3 f t Pr(X4=t) = 0.8 Pr(X4=t) = 0.5 A tree representation of a CPD Parents of X4 X2 X1 X3 X4 Allows more compact representation of CPDs. For example, we can ignore some quantities.

The learning problems • Parameter learning on known structure • Given training data D, estimate parameters of the conditional distributions • Structure learning • Given training data D, find the statistical dependency structure, G and parameters that best describe D • Subsumes parameter learning

Structure learning using score-based search ... Data Bayesian network Maximum likelihood parameters

Learning network structure is computationally expensive • For N variables there are possible networks: • Set of possible networks grows super exponentially Need approximate methods to search the space of networks

Heuristic search of Bayesian network structures • Make local changes to the network • Add an edge • Delete an edge • Reverse an edge • Evaluate score and select the network configuration with best score • We just need to check for cycles • Working with gene expression data requires additional considerations

D D D C C C B B B A A A Structure search operators Current network add an edge delete an edge Check for cycles

Decomposability of scores • Score of a graph G decomposes over individual variables • This enables us to efficiently compute the score effect of local changes • However, network inference from expression data is very challenging • Lots of nodes and not enough data • Good heuristics to prune the search space are highly desirable • Assess statistical significance of learned network structures

Extensions to Bayesian networks to handle large number of random variables • Sparse candidate algorithm • Bootstrap-based ideas to score high confidence network • Module networks (subsequent lecture)

The Sparse candidate Structure learning in Bayesian networks • Key idea: Identify k promising “candidate” parents for each node based on local measures such as correlation/mutual information • k<<N, N: number of random variables. • Restrict networks to only include a subset of the “candidate” set. • Possible pitfall • Early choices might exclude other good parents • Resolve using an iterative algorithm Friedman, 1999

Sparse candidate algorithm notation • Bn: Bayesian network at iteration n • Cin: Candidate parent set for node Xi at iteration n • Pan(Xi): Parents ofXiinBn

Sparse candidate algorithm • Input: • A data set D • An initial network B0 • A parameter k: number of parents • Output: • Network B • Loop until convergence • Restrict • Based on D and Bn-1 select candidate parents Cin-1 for variable Xi • This defines a skeleton directed network Hn • Maximize • Find network Bn that maximizes the score Score(Bn;D) among networks satisfying • Termination: Return Bn

The Restrict Step Measures of relevance

Information theoretic concepts • KullbackLeibler (KL) Divergence • Distance between two distributions • Mutual information • Mutual information between two random variables X and Y measures statistical dependence between X and Y • Also the KL Divergence between the P(X,Y) and P(X)P(Y) • Conditional Mutual information • Measures the information between two variables given a third

KL Divergence P(X), Q(X) are two distributions over X

Mutual Information • Measure of statistical dependence between two random variables, X and Y • KL Divergence between the joint and product of marginals • DKL(P(X,Y)||P(X)P(Y))

Conditional Mutual Information Measures the mutual information between X and Y, given Z If Z captures everything about X, knowing Y gives no more information about X. Thus the conditional mutual information would be zero.

Measuring relevance of candidate parents in the Restrict Step • A good parent for node Xi is one that has a strong statistical dependence with Xi • Mutual information provides a good measure of statistical dependence I(Xi; Xj) • Mutual information should be used only as a first approximation • Candidate parents need to be iteratively refined to avoid missing important dependences

D C B A Mutual information can miss some parents • Consider the following true network • If I(A;C)>I(A;D)>I(A;B) and we are selecting two candidate parents, B will never be selected as a parent • How do we get B as a candidate parent? • Note if we used mutual information alone to select candidates, we might be stuckwith C and D

Sparse candidate restrict step • Three strategies to handle the effect of greedy choices in the beginning • One can estimate the discrepancy between the (in)dependencies in the network vs those in the data • KL Divergence between P(A,D) in the data vs P(A,D) from the network. • Measure how much the current parent set shields A from D • Conditional mutual information between A and D given the current parent set of A. • Measure how much the score improves on adding D

Measuring relevance of Y to X • MDisc(X,Y) • DKL(P(X,Y)||PB(X,Y)) • MShield(X,Y) • I(X;Y|Pa(X)) • Mscore(X,Y) • Score(X;Y,Pa(X),D)

Performance of Sparse candidate over simple hill-climbing Score 15 seems to perform the best Dataset 2 Dataset 1 200 variables 100 variables

Assessing confidence in the learned network • Given the large number of variables and small datasets, the data is not sufficient to reliably determine the “best” network • One can however estimate the confidence of specific properties of the network • Graph features f(G) • Examples of f(G) • An edge between two random variables • Order relations: Is X Y’s ancestor? • Is X in the Markov blanket of Y • Markov blanket of Y is defined as those variables that render Y independent from the rest of the network • Includes Y’s parents, children and parents of Y’s children

Introduction to Networks