670 likes | 807 Views
Problem. Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction. Problem. Limited number of experimental replications. Postgenomic data intrinsically noisy.
E N D
Problem • Limited number of experimental replications. • Postgenomic data intrinsically noisy. • Poor network reconstruction.
Problem • Limited number of experimental replications. • Postgenomic data intrinsically noisy. • Can we improve the network reconstruction by systematically integrating different sources of biological prior knowledge?
+ +
+ + + + …
Which sources of prior knowledge are reliable? • How do we trade off the different sources of prior knowledge against each other and against the data?
Overview of the talk • Revision: Bayesian networks • Integration of prior knowledge • Empirical evaluation
Overview of the talk • Revision: Bayesian networks • Integration of prior knowledge • Empirical evaluation
Bayesian networks • Marriage between graph theory and probability theory. • Directed acyclic graph (DAG) representing conditional independence relations. • It is possible to score a network in light of the data: P(D|M), D:data, M: network structure. • We can infer how well a particular network explains the observed data. NODES A B C EDGES D E F
Bayesian networks versus causal networks Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.
Node A unknown A A True causal graph B C B C Bayesian networks versus causal networks
Bayesian networks versus causal networks A A A B C B C B C • Equivalence classes: networks with the same scores: P(D|M). • Equivalent networks cannot be distinguished in light of the data. A B C
Symmetry breaking A A A B C B C B C A Priorknowledge B C P(M|D) = P(D|M) P(M) / Z D: data. M: network structure
P(M) Prior knowledge: B is a transcription factor with binding sites in the upstream regions of A and C
Learning Bayesian networks P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data
Overview of the talk • Revision: Bayesian networks • Integration of prior knowledge • Empirical evaluation
Biological prior knowledge matrix Indicates some knowledge about the relationship between genes i and j Biological Prior Knowledge
Biological prior knowledge matrix Indicates some knowledge about the relationship between genes i and j Biological Prior Knowledge Define the energy of a Graph G
Notation • Prior knowledge matrix: P B (for “belief”) • Network structure: G (for “graph”) or M (for “model”) • P: Probabilities
Energy of a network Prior distribution over networks
Sample networks and hyperparameters • from the posterior distribution • Capture intrinsic inference uncertainty • Learn the trade-off parameters automatically P(M|D) = P(D|M) P(M) / Z
Energy of a network Prior distribution over networks
Energy of a network Rewriting the energy
Approximation of the partition function Partition functionof a perfect gas
Sample networks and hyperparameters from the posterior distribution Proposal probabilities Metropolis-Hastings scheme
Bayesian networkswith biological prior knowledge • Biological prior knowledge: Information about the interactions between the nodes. • We use two distinct sources of biological prior knowledge. • Each source of biological prior knowledge is associated with its own trade-off parameter:b1 and b2. • The trade off parameter indicates how much biological prior information is used. • The trade-off parameters are inferred. They are not set by the user!
Bayesian networkswith two sources of prior Source 2 Source 1 Data BNs + MCMC b1 b2 Recovered Networks and trade off parameters
Bayesian networkswith two sources of prior Source 2 Source 1 Data BNs + MCMC b1 b2 Recovered Networks and trade off parameters
Bayesian networkswith two sources of prior Source 2 Source 1 Data BNs + MCMC b1 b2 Recovered Networks and trade off parameters
Overview of the talk • Revision: Bayesian networks • Integration of prior knowledge • Empirical evaluation
Evaluation • Can the method automatically evaluate how useful the different sources of prior knowledge are? • Do we get an improvement in the regulatory network reconstruction? • Is this improvement optimal?
Raf regulatory network From Sachs et al Science 2005
Evaluation: Raf signalling pathway • Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell • Deregulation carcinogenesis • Extensively studied in the literature gold standard network
Flow cytometry data • Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins • 5400 cells have been measured under 9 different cellular conditions (cues) • Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments
Microarray example Spellman et al (1998) Cell cycle 73 samples Tu et al (2005) Metabolic cycle 36 samples time time Genes Genes