Problem

Problem • Limited number of experimental replications. • Postgenomic data intrinsically noisy. • Poor network reconstruction.

Problem • Limited number of experimental replications. • Postgenomic data intrinsically noisy. • Can we improve the network reconstruction by systematically integrating different sources of biological prior knowledge?

+ + + + …

Which sources of prior knowledge are reliable? • How do we trade off the different sources of prior knowledge against each other and against the data?

Overview of the talk • Revision: Bayesian networks • Integration of prior knowledge • Empirical evaluation

Bayesian networks • Marriage between graph theory and probability theory. • Directed acyclic graph (DAG) representing conditional independence relations. • It is possible to score a network in light of the data: P(D|M), D:data, M: network structure. • We can infer how well a particular network explains the observed data. NODES A B C EDGES D E F

Bayesian networks versus causal networks Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

Node A unknown A A True causal graph B C B C Bayesian networks versus causal networks

Bayesian networks versus causal networks A A A B C B C B C • Equivalence classes: networks with the same scores: P(D|M). • Equivalent networks cannot be distinguished in light of the data. A B C

Symmetry breaking A A A B C B C B C A Priorknowledge B C P(M|D) = P(D|M) P(M) / Z D: data. M: network structure

P(D|M)

P(M) Prior knowledge: B is a transcription factor with binding sites in the upstream regions of A and C

P(M|D) ~ P(D|M) P(M)

Learning Bayesian networks P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

Use TF binding motifs in promoter sequences

Biological prior knowledge matrix Indicates some knowledge about the relationship between genes i and j Biological Prior Knowledge

Biological prior knowledge matrix Indicates some knowledge about the relationship between genes i and j Biological Prior Knowledge Define the energy of a Graph G

Notation • Prior knowledge matrix: P  B (for “belief”) • Network structure: G (for “graph”) or M (for “model”) • P: Probabilities

Energy of a network Prior distribution over networks

Sample networks and hyperparameters • from the posterior distribution • Capture intrinsic inference uncertainty • Learn the trade-off parameters automatically P(M|D) = P(D|M) P(M) / Z

Energy of a network Prior distribution over networks

Energy of a network Rewriting the energy

Approximation of the partition function Partition functionof a perfect gas

Multiple sources of prior knowledge

MCMC sampling scheme

Sample networks and hyperparameters from the posterior distribution Proposal probabilities Metropolis-Hastings scheme

Bayesian networkswith biological prior knowledge • Biological prior knowledge: Information about the interactions between the nodes. • We use two distinct sources of biological prior knowledge. • Each source of biological prior knowledge is associated with its own trade-off parameter:b1 and b2. • The trade off parameter indicates how much biological prior information is used. • The trade-off parameters are inferred. They are not set by the user!

Bayesian networkswith two sources of prior Source 2 Source 1 Data BNs + MCMC b1 b2 Recovered Networks and trade off parameters

Evaluation • Can the method automatically evaluate how useful the different sources of prior knowledge are? • Do we get an improvement in the regulatory network reconstruction? • Is this improvement optimal?

Raf regulatory network From Sachs et al Science 2005

Raf regulatory network

Evaluation: Raf signalling pathway • Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell • Deregulation  carcinogenesis • Extensively studied in the literature  gold standard network

DataPrior knowledge

Flow cytometry data • Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins • 5400 cells have been measured under 9 different cellular conditions (cues) • Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

Microarray example Spellman et al (1998) Cell cycle 73 samples Tu et al (2005) Metabolic cycle 36 samples time time Genes Genes

DataPrior knowledge

Problem

Problem

Presentation Transcript

Problem

Problem

Problem

Problem

Problem

PROBLEM

Problem:

Problem

Problem

problem

Problem

Problem

Problem

Chapter 6 Problem 3 Problem 5 Problem 6 Problem 12

Problem