1 / 49

Comparative genomics

Comparative genomics. Seminar series Fall 2006 Vera van Noort. Announcements. Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions

inga-sims
Download Presentation

Comparative genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative genomics Seminar series Fall 2006 Vera van Noort

  2. Announcements Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Please ask questions !!! • Take an assignment sheet with you

  3. Contents Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Genomics • Functional associations • Metabolic pathways • Transcription regulation • Signaling pathways • Protein complexes • Cellular processes • Comparative genomics • Comparing genomes • Gene Fusions • Gene Neighborhood conservation • Gene Presence/Absence • Comparing genomics data • Horizontal comparative genomics • Conserved Co-expression • Conserved Yeast-2-Hybrid • Vertical comparative genomics • Evidence from multiple datasources • Bayesian integration

  4. Sequencing of genes and genomes Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics http://www.ncbi.nlm.nih.gov/genbank

  5. Genome sequence of E. coli ORIGIN 1 agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc 61 tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg 121 tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac 181 acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt 241 aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg 301 cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt 361 acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc 421 aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg 481 gcgatgattg aaaaaaccat tagcggccag gatgctttac ccaatatcag cgatgccgaa 541 cgtatttttg ccgaactttt gacgggactc gccgccgccc agccggggtt cccgctggcg 601 caattgaaaa ctttcgtcga tcaggaattt gcccaaataa aacatgtcct gcatggcatt 661 agtttgttgg ggcagtgccc ggatagcatc aacgctgcgc tgatttgccg tggcgagaaa 721 atgtcgatcg ccattatggc cggcgtatta gaagcgcgcg gtcacaacgt tactgttatc 781 gatccggtcg aaaaactgct ggcagtgggg cattacctcg aatctaccgt cgatattgct 841 gagtccaccc gccgtattgc ggcaagccgc attccggctg atcacatggt gctgatggca 901 ggtttcaccg ccggtaatga aaaaggcgaa ctggtggtgc ttggacgcaa cggttccgac 961 tactctgctg cggtgctggc tgcctgttta cgcgccgatt gttgcgagat ttggacggac Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  6. Complete genomes Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical omparative genomics • What do we need them for? • What can we use them for? How do genes make a complete cell? Functions Visualization

  7. For most genes in any genome we need function prediction Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • For many genes no function has been described • Even in a well-studied organism like E. coli only 43% have been characterized experimentally

  8. Protein function Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Predicting protein function • Levels of description • Homology for determining molecular function • Other aspect of function?

  9. “Beyond” homology and molecular function Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Homology based function prediction works very well, yet: • a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes) • There are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator, I.e. to understand the cell we need to know the interactions of the genes Thus: predicting associations

  10. There are many types of functional associations (AKA functional interactions, interactions, functional links, functional relations) in molecular biology Transcription regulation Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics P Signaling Protein complexes Cellular processes Metabolic Pathways

  11. Types of functional associations Filling gaps in metabolic pathways Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  12. Types of functional associations Transcription regulation Signaling pathways Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics P

  13. Types of functional associations Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Protein complexes Cellular process: “DNA repair” “Apoptosis”

  14. Functionally associated proteins leave evolutionary traces of their relation in genomes Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  15. Gene fusion Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • If two genes in another organism are fused into one polypeptide • A very reliable indicator for physical interaction Fusion

  16. How to detect gene fusions? Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Compare predicted protein sequences with each other using Homology searches • 1. Find orthologs, Match two complete orthologs to unmatched genes. - orthology definition • 2. Find two complete homologs matching your gene. - More complicated rules. Fusion

  17. Orthology B1 A1 A2 B2 x x Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Common gene in last common ancestor • Problems • Duplication and loss • Horizontal gene transfer • Methods • Bidirectional Best Hit • Phylogenetic reconstructions Speciation B A Duplication 1 A2 B2 Duplication 2 1 Speciation

  18. Gene order evolves rapidly Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Positional mapping of orthologs between two species • Conservation of gene neighbors • Conserved operons (transcriptional units of more than one gene)

  19. Conserved gene neighborhood Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  20. Comparison to associations in pathways: conservation implies a functional association Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  21. Presence / absence of genes Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics L. Innocua (non-pathogen) L. monocytogenes (pathogen)

  22. Presence / absence of genes Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Differences in gene content Differences in metabolic Capacities? Shared genes: shared metabolic capacities? It does not make sense to have just one member of a pathway.

  23. Presence / absence of genes Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Pathogenicity genes? L. Innocua (non-pathogen) L. monocytogenes (pathogen) Maybe not significant for one comparison, but maybe significant generalized over many genomes.

  24. Phylogenetic profiles Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  25. 1 0.8 Fraction same KEGG map (Si) 0.6 0.4 Fusion Gene Order 0.2 Co-occurrence 0 0 0.2 0.4 0.6 0.8 1 Score Context methods for prediction of functional associations Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Benchmark with Kegg metabolic pathways • Integration into one score

  26. How can you use this? Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • STRING Database • No skills needed • Parse data yourself • Some programming skills needed • Orthologs • COG database • Genomes • Genbank • Genome Atlas Database http://string.embl.de// http://www.ncbi.nlm.nih.gov/COG/ http://www.ncbi.nlm.nih.gov/Genbank http://www.cbs.dtu.dk/services

  27. No operons in eukaryotes Regulation by the same transcription factors Similarity in expression patterns in HTP expression data High correlation between vectors Correlated expression Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  28. Genes with correlated RNA expression often function in the same pathway. Not reliable enough for function prediction. Correlated mRNA expression Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  29. Orthology prediction Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  30. Total # of pairs # of pairs > 0.6 Observed fraction > 0.6 Expected fraction > 0.6 Observed/Expected Gene-pairs with an orthologous gene-pair > 0.6 Worm 18161 803 0.0442* 0.00379 12 Yeast 36548 1215 0.0332* 0.00216 15 Gene-pairs with a paralogous gene-pair > 0.6 Worm 207214 29031 0.1401* 0.00379 37 Yeast 38253 2167 0.0566* 0.00216 26 Low but significant levels of conservation of co-expression (see Teichmann et al, TIBS 2002, Stuart et al., Science 2003) van Noort et al, TIG, 2003 Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  31. Is the low level of conservation between S. cerevisiae and C. elegans of co-expression (< 5%) “real”, reflecting evolution and species-specific interactions, or are we just comparing noisy datasets ? Species specific (idiosyncratic) coregulation: “Efficient expression of the Saccharomyces cerevisiae glycolytic gene ADH1 is dependent upon a cis-acting regulatory element UASRPG found initially in genes encoding ribosomal proteins.” Tornow and Santangelo, Gene, 1990

  32. Conservation of co-expression Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  33. High level of conservation of co-regulation after speciation Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics 76 %

  34. Conservation between orthologous pairs or paralogous pairs increases the likelihood of functional interaction Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics van Noort et al, Trends Genet 2003

  35. RNAseL inhibitor Phylogenetic distribution (all archaea + all eukaryotes) Orthologous groups with that distribution: Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  36. Conserved co-expression: Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Domain composition:

  37. Combined homology and conserved co-expression Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • A role for the RNase L inhibitor in rRNA processing • predictions directly to experimental groups (Ger Pruijn)

  38. The Yeast-2-hybrid technique Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  39. Conservation of physical interactions Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Overlap between yeast and fly same size as overlap between two different yeast datasets

  40. Dataset Comparison Protein interactions, both proteins in the other dataset Conserved interactions Fraction conserved interactions Average fraction conserved interactions Ito / Uetz Yeast vs. Yeast 858 / 697 201 23.4% / 28.8% 26.1% Ito / Giot Yeast vs. Fly 229 / 394 45 19.6% / 11.4% 15.5% Uetz / Giot Yeast vs. Fly 120 / 168 33 27.5% / 19.6% 23.5% Conservation of protein-protein interaction between species Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Physical interaction is reasonably well conserved between species (…..compared to the “conservation” within a species…) Huynen et al, TIG, 2004

  41. Conservation of protein-protein interaction measured by yeast-2-hybrid increases the likelihood of interaction Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Comparison of Giot (Fly) and Ito (Yeast), Uetz (Yeast) y-2-h interactions

  42. A “new”, conserved interaction: GTPase XAB1/CG3704  hypothetical, GTPase YOR262/CG10222 XAB1 interacts with the DNA repair protein XPA1, inferred to be required for XPA1’s import in the nucleus. Fraction hypothetical proteins in conserved Y2H interactions relatively low Hypotheticals: In conserved interactions 13 5% In complete genome~1600 27% Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  43. Two types of comparative genomics Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Horizontal comparative genomics (HGT) • Comparing orthologs • Comparing genomics data between species • Vertical comparative genomics • Comparing genomics data within the same species • Integration of scores • Bayesian methods

  44. Performance of genomic context compared to high-throughput interaction data purified complexes TAP Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Purified Complexes HMS-PCI genomic context mRNA co-expression fraction of reference set covered by data two methods synthetic lethality combined evidence yeast two-hybrid three methods raw data Coverage filtered data parameter choices Accuracy fraction of data confirmed by reference set

  45. Trusted co-regulated gene pairs have similar functions Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics

  46. Threshold > sets of ‘interactions’ of gene pairs Interactions present in all datasets Interactions present in specific combinations of datasets Conservation between different datasets: the Bayesian approach Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Co-expressed mRNA 1 Co-expressed mRNA 2 overlap Set 1 Set 2 - - - + +- + +

  47. Conservation between different datasets: the Bayesian approach Red circles Historical: + ? ? ? + + ? ? + + + ? + + + + Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics Blue diamonds Consistent: + - - - + + - - + + + - + + + +

  48. Function prediction in P. falciparum Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • PFI0895c • homologous to subunit 5 of translation elongation factor 3 (elF-3 epsilon), which interacts with the ribosome. • correlated expression with ribosomal proteins L27, L21e and Sa. • Annotation of PFI0895c as elF-3 epsilon most likely. • PFI0555c • Co-expressed with two proteins that are involved in protein degradation • the aspartic proteinase and drug target PF14_0075 (plasmepsin IV) and • the ornithine aminotransferase MAL6P1.91 • role for PFI0555c in protein degradation. • Protein degradation important for host-parasite interaction.

  49. Summary Genomics Functional associations Metabolic pathways Transcription regulation Signaling pathways Protein complexes Cellular processes Comparative genomics Comparing genomes Gene Fusions Gene Neighborhood conservation Gene Presence/Absence Comparing genomics data Horizontal comparative genomics Vertical comparative genomics • Wealth of data to be explored • Genomes • Genomics data • Comparisons within and between species • Study of evolution • Prediction of gene functions

More Related