1 / 71

Acknowledgements

Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Université de Liege - Institut Montefiore 2008-2009. Acknowledgements. Material based on:

Download Presentation

Acknowledgements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioinformaticsDealing with expression dataKristel Van Steen, PhD, ScD(kristel.vansteen@ulg.ac.be)Université de Liege - Institut Montefiore2008-2009

  2. Acknowledgements Material based on: Slides from PatrikD’haeseleer, Shoudan Liang and Roland Somogyi (genetic network inference) Slides from Steve Horvath and Jun Dong (co-expression networks) Slides from Sargur Srihari (bagging and boosting)

  3. Class Outline • Genetic networks • A primer to co-expression network analysis • Bagging and boosting (as promised …) • Concensus microarray data analysis • Theory • Application

  4. Genetic networks

  5. Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference: reverse engineering

  6. Genes encode proteins, some of which in turn regulate other genes determine the structure of this intricate network of genetic regulatory interactions

  7. Traditional approach: local • Examining and collecting data on a single gene, a single protein or a single reaction at a time functional genomics

  8. Functional Genomics • Specifically, functional genomics refers to the development and application of global experimental approaches to assess gene function by making use of the information and reagents provided by structural genomic. • high throughput • large scale experimental methodologies combined with statistical and computational analysis of the results.

  9. Functional Genomics(Cont.) • We need to define the mapping from sequence space to functional space.

  10. Intermediate representation • Focus at the level of single cells • A biological system can be considered to be a state machine,where the change in internal state of the system depends on both its current internal state and any external inputs.

  11. The goal • Observe the state of a cell and how it changes under different circumstances, and from this to derive a model of how these state changes are generated • The state of cell • All those variables determining its behavior

  12. Example • A simple,6-node regulatory network

  13. Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions and Outlook

  14. The global gene expression pattern is the result of the collective behavior of individual regulatory pathways • Gene function depends on its cellular context; thus understanding the network as a whole is essential.

  15. Boolean Networks • Each gene is considered as a binary variable—either ON or OFF—regulated by other genes through logical or Boolean functions. • Even with this simplification ,the network behavior is already extremely rich.

  16. Boolean Networks(Cont.) • Cell differentiation corresponds to transitions from one global gene expression pattern to another.

  17. Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions and Outlook

  18. Scoring methods • Whether there has been a significant change at any one condition • Whether there has been a significant aggregate change over all conditions • Whether the fluctuation pattern shows high diversity according to Shannon entropy

  19. Guilt By Association • Select a gene • Determine its nearest neighbors in expression space within a certain user-defined distance cut-off

  20. Clustering • extract groups of genes that are tightly co-expressed over a range of different experiments.

  21. Caution • Different clustering methods can have very different results • It’s not yet clear which clustering methods are most useful for gene expression analysis.

  22. Definition:Gene Expression Profile • An expression profile ej of an ordered list of N samples(k=1 to N) for a particular gene j is a vector of scaled expression values vjk • The expression profile is: • ej=(vj1,vj2,vj3,…,vjN)

  23. Definition:Gene Expression Profile( Cont.) • A difference between two genes p and q may be estimated as N-dimensional metric “distance” between ep and eq. • Euclidean distance: = =

  24. Clustering algorithms • Non-hierarchical methods • Cluster N objects into K groups in an iterative process until certain goodness criteria are optimized • E.g. K-means

  25. Clustering algorithms • Hierarchical methods • Return an hierarchy of nested clusters, where each cluster typically consists of the union of two or more smaller clusters. • Agglomerative methods • Start with single object clusters and recursively merge them into larger clusters • Divisive methods • Start with the cluster containing all objects and recursively divide it into smaller clusters

  26. Other applications of co-expression clusters • Extraction of regulatory motifs • Genes in the same expression share biological funtions • Inference of functional annotation • Functions of unknown genes may be hypothesized from genes with know function within the same cluster • As a molecular signature in distinguishing cell or tissue types • mRNA expression

  27. Which clustering method to use? • There is no single best criterion for obtaining a partition because no precise and workable definition of ‘cluster’ exists. • Clusters can be of any arbitrary shapes and sizes in a multidimensional pattern space.

  28. Challenge in cluster analysis • A gene could be a member of several clusters, each reflecting a particular aspect of its function and control • Solutions • clustering methods that partition genes into non-exclusive clusters • Several clustering methods could be used simultaneously

  29. Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions and Outlook

  30. Level of biochemical detail • abstract • Boolean networks • concrete • Full biochemical interaction models with stochastic kinetics in Arkin et al.(1998)

  31. Forward and inverse modeling • Forward modeling approach • Inverse modeling, or reverse engineering • Given an amount of data, what can we deduce about the unknown underlying regulatory network? • Requires the use of a parametric model, the parameters of which are then fit to the real-world data.

  32. Outline • Introduction • A conceptual approach to complex network dynamics • Inference of regulation through clustering of gene expression data • Modeling methodologies • Gene network inference:reverse engineering • Conclusions and Outlook

  33. Goal of network inference • Construct a coarse-scale model of the network of regulatory interactions between the genes • It’s possible to reverse engineer a network from its activity profiles

  34. Data requirements • We need to observe the expression of that gene under many different combinations of expression levels of its regulatory inputs • Use data from different sources • Deal with different data types

  35. Estimates for network models • a sparse network model of N genes, where each gene is only affected by K other genes on average. a sparsely connected, directed graph with Nnodes and NK edges.

  36. Co-expression network analysis

  37. Outline • Network and network concepts • Approximately factorizable networks • Gene Co-expression Network • EigengeneFactorizability, Eigengene Conformity • Eigengene-based network concepts • What can we learn from the geometric interpretation?

  38. Network=Adjacency Matrix • A network can be represented by an adjacency matrix, A=[aij], that encodes whether/how a pair of nodes is connected. • A is a symmetric matrix with entries in [0,1] • For unweighted network, entries are 1 or 0 depending on whether or not 2 nodes are adjacent (connected) • For weighted networks, the adjacency matrix reports the connection strength between node pairs • Our convention: diagonal elements of A are all 1.

  39. Motivational example I:Pair-wise relationships between genes across different mouse tissues and genders Challenge: Develop simple descriptive measures that describe the patterns. Solution: The following network concepts are useful: density, centralization, clustering coefficient, heterogeneity

  40. Motivational example (continued) Challenge: Find a simple measure for describing the relationship between gene significance and connectivity Solution: network concept called hub gene significance

  41. Backgrounds • Network concepts are also known as network statistics or network indices • Examples: connectivity (degree), clustering coefficient, topological overlap, etc • Network concepts underlie network language and systems biological modeling. • Dozens of potentially useful network concepts are known from graph theory.

  42. Review of somefundamental network concepts which are defined for all networks (not just co-expression networks)

  43. Connectivity • Node connectivity = row sum of the adjacency matrix • For unweighted networks=number of direct neighbors • For weighted networks= sum of connection strengths to other nodes

  44. Density • Density= mean adjacency • Highly related to mean connectivity

  45. Centralization = 1 if the network has a star topology = 0 if all nodes have the same connectivity Centralization = 0 because all nodes have the same connectivity of 2 Centralization = 1 because it has a star topology

  46. Heterogeneity • Heterogeneity: coefficient of variation of the connectivity • Highly heterogeneous networks exhibit hubs

  47. Clustering Coefficient Measures the cliquishness of a particular node « A node is cliquish if its neighbors know each other » This generalizes directly to weighted networks (Zhang and Horvath 2005) Clustering Coef of the white node = 0 Clustering Coef = 1

  48. The topological overlap dissimilarity is used as input of hierarchical clustering • Generalized in Zhang and Horvath (2005) to the case of weighted networks • Generalized in Li and Horvath (2006) to multiple nodes • Generalized in Yip and Horvath (2007) to higher order interactions

  49. Network Significance • Defined as average gene significance • We often refer to the network significance of a module network as module significance.

  50. Hub Gene Significance=slope of the regression line (intercept=0)

More Related