410 likes | 743 Views
Genetic network inference: from co-expression clustering to reverse engineering. Patrik D’haeseleer,Shoudan Liang and Roland Somogyi. The goal of this review. Principles of genetic network organization Computational methods for extracting network architectures from experimental data . Outline.
E N D
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi
The goal of this review • Principles of genetic network organization • Computational methods for extracting network architectures from experimental data
Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions and Outlook
Genes encode proteins, some of which in turn regulate other genes determine the structure of this intricate network of genetic regulatory interactions
Traditional approach: local • Examining and collecting data on a single gene, a single protein or a single reaction at a time functional genomics
Functional Genomics • Specifically, functional genomics refers to the development and application of global experimental approaches to assess gene function by making use of the information and reagents provided by structural genomic. • high throughput • large scale experimental methodologies combined with statistical and computational analysis of the results.
Functional Genomics(Cont.) • We need to define the mapping from sequence space to functional space.
Intermediate representation • Focus at the level of single cells • A biological system can be considered to be a state machine,where the change in internal state of the system depends on both its current internal state and any external inputs.
The goal • Observe the state of a cell and how it changes under different circumstances, and from this to derive a model of how these state changes are generated • The state of cell • All those variables determining its behavior
Example • A simple,6-node regulatory network
Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions and Outlook
The global gene expression pattern is the result of the collective behavior of individual regulatory pathways • Gene function depends on its cellular context; thus understanding the network as a whole is essential.
Boolean Networks • Each gene is considered as a binary variable—either ON or OFF—regulated by other genes through logical or Boolean functions. • Even with this simplification ,the network behavior is already extremely rich.
Boolean Networks(Cont.) • Cell differentiation corresponds to transitions from one global gene expression pattern to another.
Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions and Outlook
Scoring methods • Whether there has been a significant change at any one condition • Whether there has been a significant aggregate change over all conditions • Whether the fluctuation pattern shows high diversity according to Shannon entropy
Guilt By Association • Select a gene • Determine its nearest neighbors in expression space within a certain user-defined distance cut-off
Clustering • extract groups of genes that are tightly co-expressed over a range of different experiments.
Caution • Different clustering methods can have very different results • It’s not yet clear which clustering methods are most useful for gene expression analysis.
Definition:Gene Expression Profile • An expression profile ej of an ordered list of N samples(k=1 to N) for a particular gene j is a vector of scaled expression values vjk • The expression profile is: • ej=(vj1,vj2,vj3,…,vjN)
Definition:Gene Expression Profile( Cont.) • A difference between two genes p and q may be estimated as N-dimensional metric “distance” between ep and eq. • Euclidean distance: • =
Clustering algorithms • Non-hierarchical methods • Cluster N objects into K groups in an iterative process until certain goodness criteria are optimized • E.g. K-means
Clustering algorithms • Hierarchical methods • Return an hierarchy of nested clusters, where each cluster typically consists of the union of two or more smaller clusters. • Agglomerative methods • Start with single object clusters and recursively merge them into larger clusters • Divisive methods • Start with the cluster containing all objects and recursively divide it into smaller clusters
Other applications of co-expression clusters • Extraction of regulatory motifs • Genes in the same expression share biological funtions • Inference of functional annotation • Functions of unknown genes may be hypothesized from genes with know function within the same cluster • As a molecular signature in distinguishing cell or tissue types • mRNA expression
Which clustering method to use? • There is no single best criterion for obtaining a partition because no precise and workable definition of ‘cluster’ exists. • Clusters can be of any arbitrary shapes and sizes in a multidimensional pattern space.
Challenge in cluster analysis • A gene could be a member of several clusters, each reflecting a particular aspect of its function and control • Solutions • clustering methods that partition genes into non-exclusive clusters • Several clustering methods could be used simultaneously
Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions and Outlook
Level of biochemical detail • abstract • Boolean networks • concrete • Full biochemical interaction models with stochastic kinetics in Arkin et al.(1998)
Forward and inverse modeling • Forward modeling approach • Inverse modeling, or reverse engineering • Given an amount of data, what can we deduce about the unknown underlying regulatory network? • Requires the use of a parametric model, the parameters of which are then fit to the real-world data.
Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions and Outlook
Goal of network inference • Construct a coarse-scale model of the network of regulatory interactions between the genes • It’s possible to reverse engineer a network from its activity profiles
Data requirements • We need to observe the expression of that gene under many different combinations of expression levels of its regulatory inputs • Use data from different sources • Deal with different data types
Estimates for network models • a sparse network model of N genes, where each gene is only affected by K other genes on average. a sparsely connected, directed graph with Nnodes and NK edges.
Estimate for network models(Cont.) • To specify the correct model, we need bits of information.
Correlation Metric Construction • Adam Arkin and John Ross • A method to reconstruct reaction networks from measured time series of the component chemical species. • The system is driven using inputs for some of the chemical species and the concentration of all the species is monitored over time.
Correlation Metric Construction(Cont. ) • The time-lagged correlation matrix is calculated • From this a distance matrix is constructed based on the maximum correlation between any two chemical species • This distance matrix is then fed into a simple clustering algorithm to generate a tree of connections between the species • The results are mapped into a two-dimensional graph for visualization
Additive regulation models • Property • The regulatory inputs are combined using a weighted sum • Can be used as a first-order approximation to the gene network
Additive regulation models • The change in each variable over time is given by a weighted sum of all other variables • is the level of the i-th varibale • is a bias term indicating whether I is expressed of not in the absence of regulatory inputs • represents the influence of j on the regulation of i
Use of such models • We can infer regulatory interactions directly from the data, by fitting these simple network models to large scale gene expression data.
Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions
Conclusion • Conceptual foundations for understanding complex biological networks • Several practical methods for data analysis