ARACNE: An Algorithm for the Reconstruction of Gene Regulatory

THE JOHNS HOPKINS UNIVERSITY 550.635 TOPICS IN BIOINFORMATICS ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context Adam A Margolin, Ilya Nemenman, Katia Basso, Chris Wiggins, Gustavo Stolovitzky, Riccardo Dalla Favera and Andrea Califano. Reverse engineering of regulatory networks in human B cells Katia Basso, Adam Margolin, Gustavo Stolovitzky, Ulf Klein, Riccardo Dalla-Favera, and Andrea Califano http://www.cis.jhu.edu/~sanchez/ARACNE_635_sanchez.ppt Instructor:Dr. Donald Geman Student:Francisco Sánchez Vega Baltimore, 28 November 2006

Outline • A very short introduction to MRF • The ARACNE approach • Theoretical framework • Technical implementation • Experiments and results • Synthetic data • Human B cells (paper by Basso et. al) • Discussion

Xt0 Xt1 X(t-1) Xt X(t+1) … … P(Xt|Xt0,Xt1,…,X(t-1))=P(Xt|X(t-1)) " The world is a huge Markov chain " (Ronald A. Howard) A very short introduction to MRFs What is a Markov Random Field ? • Intuitive idea: concept of Markov chain

A very short introduction to MRFs What is a Markov Random Field ? • A Markov random field is a model of the joint probability distribution over a set X of random variables. • A generalization of Markov Chains to a process indexed by the sites of a graph and satisfying a Markov property to the neighborhood system.

A very short introduction to MRFs MRF: a constructive definition • Let X={X1,…,XN} be the set of variables (the stochastic process) whose JPD we want to model. • Start by considering a set of sites or vertices V={V1,…,VN}. • Define a neighborhood system: ={v, v V} where (1) v V (2) v v (3) v u u v

c=1 c=2 c=8 A very short introduction to MRFs MRF: a constructive definition • Example: “N times N” square lattice V={(i,j):1i,j  N} (i,j) ={ (k,l)V: 1(i-k)2+(l-j)2 c}

c=1 c=2 A very short introduction to MRFs MRF: a constructive definition • At this point, we can build an undirected graph G=(V, E) • Each vertex vV is associated to one of the random variables in X • The set of edges E is given by the chosen neighborhood system.

c=2 c=1 A very short introduction to MRFs MRF: a constructive definition • Clique: induced complete subgraph

A very short introduction to MRFs MRF: a constructive definition • Imagine that each variable Xv in X={X1,…,XN} can take one of a finite number Lv of values: • Define the configuration space S as • We say that X is a MRF on (V,) if (a) P(X=x)>0, for all x  S (b) P(Xv=xv|Xu=xu,uv)=P(Xv=xv|Xu=xu,uv) for vV

A very short introduction to MRFs MRF: a constructive definition c=1 c=2

A very short introduction to MRFs • Let us now associate a specific function, that we will call potential, to each clique c C, so that: • We say that a probability distribution p  PS is a Gibbs distribution wrt (V,) if it is of the form: where Z is the partition function:

A very short introduction to MRFs Correspondence Theorem X is a MRF with respect to (V, ) if and only if p(x)=P(X=x) is a Gibbs distribution with respect to (V, ) (i.e. every given Markov Random Field has an associated Gibbs distribution and every given Gibbs distribution has an associated Markov Random Field).

Outline • A very short introduction to MRF • The ARACNE approach • Theoretical framework • Technical implementation • Experiments and results • Synthetic data • Human B cells (paper by Basso et. al) • Discussion

Microarray data The name of the game Theoretical framework Graphical model

Theoretical framework • Need of a strategy to deal with uncertainty under small samples: Maximum entropy models • Philosophical basis: • Model all that is known and assume nothing about that which is unknown. • Given a certain dataset, choose a model which is consistent with it, but otherwise as “uniform” as possible

The most “uniform” model is the one that maximizes the entropy H(A) Theoretical framework MaxEnt: toy example (adapted from A. Berger) • Paul, Mary, Jane, Brian and Emily are five grad students that work in the same research lab. • Let us try to model the probability of the discrete random variable X“first person to arrive at the lab in the morning”. p(X=Paul)+p(X=Mary)+p(X=Jane)+p(X=Bryan)+p(X=Emily)=1 • If we do not know anything about them: p(X=Paul) = 1/5 p(X=Mary) = 1/5 p(X=Jane) = 1/5 p(X=Bryan) = 1/5 p(X=Emily) = 1/5

Maximize H(P(X)) under the given constraints Theoretical framework MaxEnt: toy example (adapted from A. Berger) • Imagine that we know that Mary or Jane are the first ones to arrive 30% of the time. • In that case, we know: p(X=Paul)+p(X=Mary)+p(X=Jane)+p(X=Bryan)+p(X=Emily)=1 p(X=Mary)+p(X=Jane)=3/10 • Again, we may want to choose the most “uniform” model, although this time we need to respect the new constraint: p(X=Paul) = (7/10)(1/3) = 7/30 p(X=Mary) = (3/10)(1/2) = 3/20 p(X=Jane) = (3/10)(1/2) = 3/20 p(X=Bryan) = (7/10)(1/3) = 7/30 p(X=Emily) = (7/10)(1/3) = 7/30

Theoretical framework MaxEnt extensions ~ constrained optimization problem • S  countable set • PS set of prob. measures on S, PS ={p(x): xS}. •   set of K constraints for the problem •   subset of PS that satisfies all the constraints in  • Let {1,…,K} be a set of K functions SR[feature functions] • We choose these functions so that each given constraint can be expressed as: • If x are samples from a r.v. X, then E[i(X)]=i

Theoretical framework MaxEnt extensions ~ constrained optimization problem • Ex.Let x be a sample from X”person to arrive first in the lab” In this case, S={Paul, Mary, Jane, Bryan, Emily} • ={1}, 1p(X=Mary)+p(X=Jane)=3/10 • We can model 1 it as follows: 1 if x=Mary or x=Jane 0 otherwise Define 1(x) 1=3/10 So that

Theoretical framework MaxEnt extensions ~ constrained optimization problem Find p(x)=arg max p H(p(x)) •  subset of PS that satisfies all the constraints in : Of course, since p(x) is a probability measure:

i H(p) Theoretical framework • Use Lagrange multipliers:

Theoretical framework • This leads to a solution of the form: But this is a Gibbs distribution, and therefore, by theorem, we can think in terms of the underlying Markov Random Field !! • We can profit from previous knowledge of MRF • We have found the graphical model paradigmthat we were looking for!

The name of the game Theoretical framework Technical implementation Microarray data Graphical model

Approximation of the interaction structure • Consider the discrete, binary case. First order constraints: X1 X5 X2 N X3 … X4

constraints Approximation of the interaction structure • Consider the discrete, binary case. Second order constraints:

Approximation of the interaction structure • Consider the discrete, binary case. For jth order: constraints • The higher the order, the more accurate our approximation will be… … provided we observe enough data !!

(from Elements of Statistical learning, by Hastie et Tibshirani) Approximation of the interaction structure (from Dr. Munos lectures, Master MVA) “…for M → ∞ (where M is sample set size) the complete form of the JPD is restored. In fact, M > 100 is generally sufficient to estimate 2-way marginals in genomics problems, while P(gi, gj, gk) requires about an order of magnitude more samples…”

Approximation of the interaction structure • Therefore, the model they adopt is of the form: • All genes for which ij= 0 are declared mutually non-interacting: • Some of these genes are statistically independent i.e. P(gi, gj) ≈ P(gi)P(gj)) • Some are genes that do not interact directly but are statistically dependent due to their interaction via other genes. i.e. P(gi, gj) ≠ P(gi)P(gj), but ij = 0 How can we extract this information ??

Approximation of the interaction structure • Therefore, the model they adopt is of the form: Pairs of genes for which the MI is a rightful indicator of dependency P(gi, gj) = P(gi)P(gj) ij=0 Pairs of genes who interact through a third gene Pairs of genes whose direct interaction is balanced out by a third gene

Mutual Information • The mutual information between two random variables is a measure of the amount of information that one random variable contains about another. • It is defined as the relative entropy (or Kullback-Leibler distance) between the joint distribution and the product of the marginals: • Alternative formulations:

Mutual Information estimation Another toy example P(X) P(Y) P(X=0)=6/10; P(X=1)=4/10; P(Y=0)=4/10; P(Y=1)=6/10; P(X=0,Y=0)=2/10 P(X=0,Y=1)=4/10 P(X=1,Y=0)=2/10 P(X=1,Y=1)=2/10 P(X,Y)

Mutual Information estimation Another toy example The Gaussian Kernel estimator

Mutual Information estimation Another toy example P(X,Y) f(x,y) f(x) f(y)

h’ = 4h h’’ = h/2 Mutual Information estimation Another toy example Reference (h=1)

Mutual Information estimation • Every pair of variables is copula-transformed: (X1,X2)[FX1(X1),FX2(X2)], where FX(x)=P(Xx) • By the Probability Integral Transform Theorem, we know that FX(X)~U[0,1]. • Thus, the resulting variables have range between 0 and 1, and marginals are uniform. • Transformation is one-to-one H and MI unaffected • Reduction of the influence of arbitrary transformations from microarray data preprocessing. • No need for position dependent kernel widths (h).

Mutual Information estimation • The choice of h is critical for the accuracy of the MI estimate, but not so important for estimation of MI ranks.

Technical implementation • At this point, we have a procedure to construct an undirected graph from our data. • In an “ideal world” (infinite samples, assumptions hold), we would be done. • Unfortunately, things are not so simple !

Technical implementation • Finite random samples • Possible higher-order interactions For each pair: • H0: the two genes are mutually independent (no edge) • HA: the two genes interact with each other (edge) • We reject the null hypothesis H0 (i.e. we “draw an edge”) when the MI between two genes is big enough.  Need to choose a statistical threshold I0 Use hypothesis testing

Choice of the statistical threshold • Chosen approach: Random permutation analysis • Randomly shuffle gene expression values and labels

Choice of the statistical threshold • Chosen approach: Random permutation analysis • Randomly shuffle gene expression values and labels • The resulting variables are supposed to be mutually independent, but their MI will need not be equal to zero. • Each threshold I0 can then be assigned a p-value (which measures the probability of getting a MI value higher or equal to I0 just “by chance”). • Thus, by fixing a p-value, we obtain the desired I0.

I0 Choice of the statistical threshold • Chosen approach: Random permutation analysis Number of pairs MI values

Choice of the statistical threshold • Chosen approach: Random permutation analysis Parameter fitting: use known fact from large deviation theory 1 2 3

Technical implementation • If we have a real distribution of the type: ARACNe will get it wrong, because of our assumption =0 =0 =0 The extension of the algorithm to higher order interactions is a possible object of future research. Real network ARACNe’s output

Technical implementation • If we have a real distribution of the type: Iij=0 ARACNE will not identify the edge between gi and gj 0 However, this situation is considered “biologically unrealistic” ARACNe’s output Real network

Technical implementation • If we have a real distribution of the type: Iij 0 As it is, ARACNE will put an edge between gi and gj =0 The algorithm can be improved using the Data Processing Inequality ARACNe’s output Real network

The Data Processing Inequality • The DPI is a well known theorem within the Information Theory community. Let X,Y,Z be 3 random variables that form a Markov Chain X-Y-Z, then I(X;Y)I(X,Z) Proof. I(X; Y,Z) = I(X,Z) + I(X; Y|Z) = I(X,Y) + I(X; Z|Y) X and Z are conditionally independent given Y  I(X; Z|Y)=0 Thus, I(X,Y) = I(X,Z) + I(X; Y|Z) I(X,Z), since I(X; Y|Z) 0 Similarly, we can prove I(Y;Z)I(X,Z) , and therefore: I(X,Z) min[I(X;Y) ,I(Y;Z)]

Mike Francisco Jean-Paul The Data Processing Inequality • A consequence of the DPI is that no transformation of Y, as clever as it can be, can increase the information that Y contains about X. (consider the MC of the form X-Y-[Z=g(Y)] ) • From an intuitive point of view:

gi 0.1 gj 0.3 0.2 gk The Data Processing Inequality • Going back to our case of study 0.1  0.3 0.2 0.1 0.2 0.3 0.1 0.2 0.3 So I assume that ij=0, even though P(gi,gj)  P(gi)P(gj)

The Data Processing Inequality • But, what if the underlying network is truly a three-gene loop? …Then ARACNE will break the loop at the weakest edge! gi gj (known flaw of the algorithm) Philosophy: “An interaction is retained iff there exist no alternate paths which are a better explanation for the information exchange between two genes” gk Claim:In practice, looking at the TP vs. FP tradeoff, it pays to simplify

The Data Processing Inequality • Theorem 1. If MIs can be estimated with no errors, then ARACNE reconstructs the underlying interaction network exactly, provided this network is a tree and has only pairwise interactions. • Theorem 2. The Chow-Liu (CL) maximum mutual information tree is a subnetwork of the network reconstructed by ARACNE. • Theorem 3. Let πikbe the set of nodes forming the shortest path in the network between nodes i and k. Then, if MIs can be estimated without errors, ARACNE reconstructs an interaction network without false positive edges, provided: (a) the network consists only of pairwise interactions, (b) for each j πik, Iij≥ Iik. Further, ARACNE does not produce any false negatives, and the network reconstruction is exact iff (c) for each directly connected pair (ij) and for any other node k, we have Iij≥ min(Ijk, Iik).

ARACNE: An Algorithm for the Reconstruction of Gene Regulatory

ARACNE: An Algorithm for the Reconstruction of Gene Regulatory

Presentation Transcript

10/19/05 Gene Regulation (formerly Gene Prediction - 2)

What Is a Gene Network?

Phylogeny Reconstruction

Finding Regulatory Motifs in DNA Sequences

Lesson Four

Genome-wide computational prediction of transcriptional regulatory modules reveal new insights into human gene expressio

Regulation of Gene Expression

Two gene regulatory networks for T cell development

Lesson Two

Regulation of Gene Expression

3D Environment Reconstruction

Gene Regulation and Microarrays

Gene Regulatory Networks - the Boolean Approach

6. Gene Regulatory Networks

Stochastic description of gene regulatory mechanisms

Inference of gene regulatory networks using regression based network method

Transcription factor binding sites and gene regulatory network

RECONSTRUCTION ALGORITHM CALIBRATION OBJECT ESD STRUCTURE ONGOING TASKS

Reconstruction of Ancestral Gene Order after Segmental Duplication and Gene Loss

ARACNE associazione di promozione sociale – Naples - Italy