1 / 38

Comparative Expression

Comparative Expression. Moran Yassour. +. =. Goal. Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes interact Distinguish between accidentally regulated genes from those that are physiologically important.

abra-barker
Download Presentation

Comparative Expression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Expression Moran Yassour + =

  2. Goal • Build a multi-species gene-coexpression network • Find functions of unknown genes • Discover how the genes interact • Distinguish between accidentally regulated genes from those that are physiologically important

  3. Construction of a gene-coexpression network. • Evolutionarily diverse organisms with extensive microarray data: • Homo sapiens • Drosophila melanogaster • Caenorhabditis elegans • Saccharomyces cerevisiae. • We first associated genes from one organism with their orthologous counterparts in other organisms.

  4. Evolution 101 • Paralogs vs. Orthologs

  5. Evolution 101 • Paralogs vs. Orthologs

  6. MEG Construct a metagene • Using this method, we assigned each gene to at most a single metagene. identify connected components ignore non-reciprocal hits Human gene Worm gene best BLAST hit Fly gene Yeast gene

  7. Some numbers • In total we have 6307 metagenes (6591 human genes, 5180 worm genes, 5802 fly genes, and 2434 yeast genes.) • We sought to identify pairs of metagenes that not only were coexpressed in one experiment and in one organism but that also showed correlation in diverse experiments in multiple organisms.

  8. MEG1 ? MEG2 2 4 2 Edges in the graph Human Fly Worm 1 5 1 5 3 2 4 2 4 3 4 5 2 1 3 MEG1 MEG2 {2,4,2} significant ? (P-value <? 0.05)  draw an edge

  9. Statistical tests (1) – permuted metagenes • Construction of a network from a set of permuted metagenes (random collection of genes from each organism) • At P < 0.05, the real networks contained 3.5 ± 0.03 times as many interactions as the random networks contained

  10. Statistical tests (2) – half the data • Split microarray data into halves  two networks • We then counted the fraction of interactions that were significant in one network (P < 0.05), given that they were significant in the other network at P < p for various values of p. • P = 0.05  41% significant expression interactions

  11. Statistical tests (3) – noise stability • We added increasing levels of Gaussian noise to the entire data set for each of the organisms. Noise negative log P-value Real network negative log P-value

  12. Visualization • x-y plane – negative logarithm of P value • K-means clustering • z axis – density of genes in the region

  13. Example – Component 5 • A total of 241 metagenes • 110 of which were previously known to be involved in the cell cycle. • 202 cell cycle metagenes in the network. • P-value < 10-85 • Of the 241 cell cycle metagenes: • 30 – regulating the cell cycle. • 80 – terminal cell cycle functions. • 131 – unknown.

  14. Experimental validation (1) – expression data • Five metagenes with a significant number of links to known cell proliferation genes. • Measuring expression levels in dividing pancreatic cancer cells and in nondividing normal cells.

  15. Experimental validation (2) – loss-of-function mutant • loss-of-function mutant phenotype for one of these genes (C. elegans gene ZK652.1) • RNA interference (RNAi) of ZK652.1 resulted in excess nuclei in the germ line, suggesting that the wild-type function of this gene is to suppress germline proliferation.

  16. Multi-species vs. single species (1) • For each gene (of the five metagenes), we constructed an organism-specific neighborhood. • On average, the neighborhoods of these five genes were over four times more enriched for cell proliferation and cell cycle genes in the multiple-species network than they were in the best single-species neighborhood.

  17. Multi-species vs. single species (2) • Trying to link together • genes that were previously known to be involved in a single function (coverage) • excluding genes not known to participate in that function (accuracy)

  18. Huge data • The multiple-species network was built from more DNA microarray data (3182). • Construction of the network out of only 979 DNA microarrays (as in the worm data set) gave similar results.

  19. Summary - Multi is good • We map only genes that have orthologs in other species and thus focuses strongly on core, conserved biological processes; • Interactions in the multiple-species network imply a functional relationship based on evolutionary conservation. • Nice to have – analysis of other components.

  20. Goal • Comparative study of large datasets of expression profiles from six evolutionarily distant organisms:

  21. Goal • Coexpression is often conserved. • Comparing the regulatory relationships between particular functional groups in the different organisms. • Comparing global topological properties of the transcription networks derived from the expression data, using a graph theoretical approach.

  22. Homologous gene with preserved function

  23. Coexpression conservation • Coexpressed groups - yeast transcription modules • For each yeast module we constructed five “homologue modules”.

  24. Refining homologue modules • The signature algorithm identifies those homologues that are coexpressed under a subset of the experimental conditions. • Furthermore, it reveals additional genes that are not homologous with any of the original genes, but display a similar expression pattern under those conditions

  25. Correlation distribution • the distribution of the Z-scores for the average gene–gene correlation of all the “homologue modules”

  26. Higher-order regulatory structures

  27. Cell Cycle Experiments

  28. Subsets of the data • Correlations between the sets of conditions for randomly selected subsets of the data. • Although the data is sparse , the findings reflect real properties of the expression network.

  29. Protein synthesis Decomposition of the expression data • Decomposition of the expression data into a set of transcription modules using the iterative signature algorithm (ISA) • Modules are colored according to the fraction of homologues they possess in the other organism

  30. Power-law connectivity distribution

  31. Connections & Connectivity • Connections between genes of similar connectivity are enhanced (red regions) • Connections between highly and weakly connected genes are suppressed (blue)

  32. Essentiality & Connectivity • The likelihood of a gene to be essential increases with its connectivity.

  33. Homology & Connectivity • The highly connected genes are more likely to have homologues in the other organisms

  34. Summary • Similarity in lower resolution, differences in higher resolution: • All expression networks share common topological properties (scale-free connectivity distribution, high degree of modularity). • The modular components of each transcription program as well as their higher-order organization appear to vary significantly between organisms and are likely to reflect organism-specific requirements.

  35. Future • Gene expression studies • Evolution studies

  36. Thank you …

More Related