180 likes | 453 Views
Genome projects and model organisms. Level 3 Molecular Evolution and Bioinformatics Jim Provan. Genome projects and model organisms. Genome projects. Completed genomes: Eubacteria (inc. Escherichia coli , Bacillis subtilis, Haemophilus influenzae, Synechocystis PCC6803)
E N D
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan
Genome projects • Completed genomes: • Eubacteria (inc. Escherichia coli, Bacillis subtilis, Haemophilus influenzae, Synechocystis PCC6803) • Archaea (inc. Methanococcus jannaschii, Methanobacterium thermoautotrophium) • Eukarya: • Saccharomyces cerevisiae • Caenorhabditis elegans • Homo sapiens • Arabidopsis thaliana • Partially sequenced genomes e.g. Drosophila melanogaster, Fugu rubripes, Oryza sativa
H. sapiens C. elegans D. melanogaster S. cerevisiae A. thaliana Methanococcus Archaeglobus Synechocystis PCC6803 B. subtilis M. genitalium M. pneumoniae B. burgdorferi H. influenzae E. coli Relationships between model organisms
Eubacterial genomes: Bacillus subtilis • Genome 4,214,810 bp: • 4100+ protein sequences • Average gene 890 bp • Density 1 gene / 1028 bp • 89% of total genome is protein-coding • Protein coding genes: • 53% single copy • 47% paralogous gene families: • Mostly involved in transport • Genes are proximal i.e. have evolved through tandem duplication of single genes
Eubacterial genomes: Bacillus subtilis • On the basis of homology with genes of known function, 58% of B. subtilis genes could be assigned to functional categories • The B. subtilis genome contains remnants of 10 prophages, suggesting that horizontal transfer has played a significant role in evolution of the genome • Orthologous counterparts in other bacteria: • ~1000 genes (24%) have counterparts in E. coli (Gram -ve) • More significantly, ~100 operons conserved as well • ~800 genes (20%) have orthologues in Synechocystis PCC6803 (Cyanobacterium)
Eubacterial genomes: Mycoplasmas • Obligate parasites • Thought to be derived from Gram +ve bacteria similar to B. subtilis • 312 genes of M. genitalium (66%) have homologues in Gram +ve bacteria • Parasitic lifestyle has led to a dramatic reduction in genome size and content • Smallest-known genome in a self-replicating organism
Eubacterial genomes: Mycoplasmas • M. genitalium genome: • Circular chromosome of 580,070 bp • Only 470 predicted genes for DNA replication, transcription and translation, DNA repair, cellular transport and energy metabolism • Coding regions comprise ~88% of the genome • Similar to H. influenzae (85%) • Suggests that genome reduction has been due to loss of genes and not reduction in gene size or increase in gene density • M. pneumoniae genome: • Larger than M. genitalium (816 kbp) • All M. genitalium genes found in M. pneumoniae • Not simply truncated - evidence of genome rearrangements
Eubacterial genomes: E. coli • 4288 protein coding genes: • Average ORF 317 amino acids • Very compact: average distance between genes 118bp • Numerous paralogous gene families: 38 – 45% of genes arisen through duplication • Homologues: • H. influenzae (1130 of 1703) • Synechocystis (675 of 3168) • M. jannaschii (231 of 1738) • S. cerevisiae (254 of 5885)
The minimum genome and redundancy • Minimum set of genes required for survival: • Replication and transcription • Translation (rRNA, ribosomal proteins, tRNAs etc.) • Transport proteins to derive nutrients • ATP synthesis • Entire pathways eliminated in Mycoplasma: • Amino acid biosynthesis (1 gene vs. 68 in H. influenzae) • Metabolism (44 genes vs. 228 in H. influenzae) • Comparison of M. genitalium and H. influenzae has identified a minimum set of 256 genes
Archaeal genomes: M. jannaschii • Requires no organic nutrients for growth: has all biochemical pathways to use inorganic constituents • Only 38% of genes could be assigned a known function • Genes for translation, transcription and DNA replication similar to eukaryote genes: • DNA polymerase • Ribosomal proteins • Translation initiation factors
Fungal genomes: S. cerevisiae • First completely sequenced eukaryote genome • Very compact genome: • Short intergenic regions • Scarcity of introns • Lack of repetitive sequences • Strong evidence of duplication: • Chromosome segments • Single genes • Redundancy: non-essential genes provide selective advantage
Plant genomes: Arabidopsis thaliana • Contains 25,498 genes from 11,000 families • Cross-phylum matches: • Vertebrates 12% • Bacteria / Archaea 10% • Fungi 8% • 60% ESTs have no match in non-plant databases • Evolution involved whole genome duplication followed by subsequent gene loss and extensive local gene duplications
Invertebrate genomes: C. elegans • Genome even less compact than yeast: • One gene every 7143 bp (2155 bp in yeast) • Due mainly to introns in protein coding genes • Much more compact than humans (One gene every 50,000 bp) • Compactness due mainly to polycistronic arrangement: • Trans-splicing • Co-expression and co-regulation
Vertebrate genomes: Fugu rubripes • Pufferfish genome (400 Mb) only four times larger than C. elegans and 7.5 times smaller than human genome • Homologous genes in Fugu and mammals show conserved synteny: • Same exon-intron organisation • Introns much smaller • Useful for identifying conserved essential elements in vertebrate genomes
The genome of the cenancestor • Availability of complete genome sequences from the three domains of life creates an opportunity for the reconstruction of the complete genome of the common ancestor: • Of minimal bacterial set (256 genes), 143 have orthologues in yeast (eukaryote) • Universal translation apparatus suggests that cenancestor had a fully developed translation system • Extreme differences in DNA replication apparatus • Many fundamental metabolic processes are carried out by similar proteins in Archaea and eubacteria: • Suggests a universal, autotrophic ancestor • Not all central metabolism is universal (methanogenesis, photosynthesis etc.)