1.28k likes | 1.41k Views
Dirk Husmeier. Inferring gene regulatory networks from transcriptomic profiles. Biomathematics & Statistics Scotland. Overview. Introduction Methodology Circadian regulation in Arabidopsis Application to synthetic biology DREAM. Network reconstruction from postgenomic data. Accuracy.
E N D
Dirk Husmeier Inferring gene regulatory networks from transcriptomic profiles Biomathematics & Statistics Scotland
Overview • Introduction • Methodology • Circadian regulation in Arabidopsis • Application to synthetic biology • DREAM
Accuracy Mechanistic models Bayesian networks Conditional independence graphs Methods based on correlation and mutual information Computational complexity
Accuracy Mechanistic models Bayesian networks Conditional independence graphs Methods based on correlation and mutual information Computational complexity
Shortcomings Pairwise associations do not take the context of the systeminto consideration direct interaction common regulator indirect interaction co-regulation
Accuracy Mechanistic models Bayesian networks Conditional independence graphs Methods based on correlation and mutual information Computational complexity
1 2 Direct interaction 1 2 Conditional independence graphs (CIGs) Inverse of the covariance matrix strong partial correlation π12 Partial correlation, i.e. correlation conditional on all other domain variables Corr(X1,X2|X3,…,Xn)
Correlation Partial correlation high high high high high low highlow high low
1 2 Direct interaction 1 2 Conditional Independence Graphs (CIGs) Inverse of the covariance matrix strong partial correlation π12 Partial correlation, i.e. correlation conditional on all other domain variables Corr(X1,X2|X3,…,Xn) Problem: #observations < #variables Covariance matrix is singular
Accuracy Mechanistic models Bayesian networks Conditional independence graphs Methods based on correlation and mutual information Computational complexity
Description with differential equations Concentrations Kinetic parameters q Rates
Model Parameters q Probability theory Likelihood
1) Practical problem: numerical optimization q 2) Conceptual problem: overfitting ML estimate increases on increasing the network complexity
Overfitting problem True pathway Poorer fit to the data Equal or better fit to the data Poorer fit to the data
Regularization E.g.: Bayesian information criterion (BIC) Regularization term Data misfit term Maximum likelihood parameters Number of parameters Number of data points
Likelihood BIC Complexity Complexity
Model selection: find the best pathway Select the model with the highest posterior probability: This requires an integration over the whole parameter space:
MCMC based schemes q Problem: excessive computational costs
Accuracy Mechanistic models Bayesian networks Conditional independence graphs Methods based on correlation and mutual information Computational complexity
Marriage between graph theory and probability theory Friedman et al. (2000), J. Comp. Biol. 7, 601-620
Bayes net ODE model
Model Parameters q Bayesian networks: integral analytically tractable!
Identify the best network structure Ideal scenario: Large data sets, low noise
Uncertainty about the best network structure Limited number of experimental replications, high noise
Sample of high-scoring networks Feature extraction, e.g. marginal posterior probabilities of the edges
Sample of high-scoring networks Feature extraction, e.g. marginal posterior probabilities of the edges Uncertainty about edges High-confident edge High-confident non-edge
Sampling with MCMC Number of structures Number of nodes
Model Parameters q Bayesian networks: integral analytically tractable!
Linearity assumption [A]= w1[P1]+ w2[P2] + w3[P3] + w4[P4] + noise P1 w1 P2 A w2 w3 P3 w4 P4
Homogeneity assumption Parameters don’t change with time
Homogeneity assumption Parameters don’t change with time
Overview • Introduction • Methodology • Circadian regulation in Arabidopsis • Application to synthetic biology • DREAM
Accuracy Mechanistic models Bayesian networks Conditional independence graphs Methods based on correlation and mutual information Computational complexity
Our new model: heterogeneous dynamic Bayesian network. Here: 2 components
Our new model: heterogeneous dynamic Bayesian network. Here: 3 components
Extension of the model q Allocation vector h k Number of components (here: 3)
Analytically integrate out the parameters q Allocation vector h k Number of components (here: 3)
Non-homogeneous model Non-linear model