1 / 64

Rob Edwards phage.sdsu/~rob Fellowship for Interpretation of Genomes,

SIO, San Diego, May 2006. What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics. Rob Edwards http://phage.sdsu.edu/~rob Fellowship for Interpretation of Genomes, San Diego State University, Burnham Institute for Medical Research,

ugo
Download Presentation

Rob Edwards phage.sdsu/~rob Fellowship for Interpretation of Genomes,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIO, San Diego, May 2006 What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards http://phage.sdsu.edu/~rob Fellowship for Interpretation of Genomes, San Diego State University, Burnham Institute for Medical Research, IMEC, LLC

  2. Outline • Sequencing statistics scare skeptics • The SEED database • Some simply stunning Subsystems • Mysterious missing methionine metabolism • Marine metabolism mined from metagenomics • Fabulous four-five-four for facile functional findings • Marine phage most puzzling

  3. The Players • FIG: Fellowship for Interpretation of Genomes • NMPDR: Natl. Microbial Pathogen Data Resource • BRC: NIH Bioinformatics Resource Centers • SEED: The SEED database.

  4. How Many Genomes Have Been Sequenced?

  5. How Many Genomes Have Been Sequenced?

  6. How Many Genomes Have Been Sequenced?

  7. How Many Genomes Have Been Sequenced?

  8. 5,000 4,000 Complete Genomes 3,000 2,000 1,000 X X X X X X X X X X 1996 2000 2004 2008 Year When will the 1,000thmicrobial genome be sequenced?

  9. Outline • Sequencing statistics scare skeptics • The SEED database • Some simply stunning Subsystems • Mysterious missing methionine metabolism • Marine metabolism mined from metagenomics • Fabulous four-five-four for facile functional findings • Marine phage most puzzling

  10. The SEED database developed by FIG http://theseed.uchicago.edu/FIG/index.cgi • Current version: • 580 Bacteria (342 complete) • 38 Archaea (26 complete) • 562 Eukarya (29 complete) • 1335 Viruses • 2 Environmental Genomes

  11. The problem: How do you generate consistent annotations for 1,000 genomes?

  12. Basic biology lacI lacZ lacY lacA

  13. < 80 % < 80% < 80 % Different types of clustering

  14. Occurrence of clustering in different genomes 1 Clusters of genes w/ maximum 80% identity Genes in subsystems in clusters Total number of genomes in group 120 0.8 Fraction of genes in clusters 0.6 80 Number of genomes 0.4 40 0.2 0 0 Average Aquificae Firmicutes Chloroflexi Chlamydiae Deinococcus- Thermus Spirochaetes Thermotogae Bacteroidetes Cyanobacteria Actinobacteria Proteobacteria

  15. Outline • Sequencing statistics scare skeptics • The SEED database • Some simply stunning Subsystems • Mysterious missing methionine metabolism • Marine metabolism mined from metagenomics • Fabulous four-five-four for facile functional findings • Marine phage most puzzling

  16. The Subsystems Approach to Annotation • Subsystem is a generalization of “pathway” • collection of functional roles jointly involved in a biological process or complex • Functional Role is the abstract biological function of a gene product • atomic, or user-defined, examples: • 6-phosphofructokinase (EC 2.7.1.11) • LSU ribosomal protein L31p • Streptococcal virulence factors • Does not contain “putative”, “thermostable”, etc • Populated subsystem is complete spreadsheet of functions and roles

  17. Subsystems developed based on • Wet lab • Chromosomal context • Metabolic context • Phylogenetic context • Microarray data • Proteomics data • …

  18. Example Subsystem: Histidine Degradation • Conversion of histidine to glutamate • Functional roles defined in table • Inclusion in subsystem is only by functional role • Controlled vocabulary …

  19. Subsystem Spreadsheet HutH HutU HutI GluF HutG NfoD ForI Organism Variant Bacteroides thetaiotaomicron Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0 1 Desulfotela psychrophila gi51246205 gi51246204 gi51246203 gi51246202 1 Halobacterium sp. Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7 2 Deinococcus radiodurans Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04 2 Bacillus subtilis P10944 P25503 P42084 P42068 2 Caulobacter crescentus P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9 3 Pseudomonas putida Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3 3 Xanthomonas campestris Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5 3 Listeria monocytogenes -1 Subsystem Spreadsheet • Column headers taken from table of functional roles • Rows are selected genomes or organisms • Cells are populated with specific, annotated genes • Functional variants defined by the annotated roles • Variant code -1 indicates subsystem is not functional • Clustering shown by color

  20. Subsystem Spreadsheet HutH HutU HutI GluF HutG NfoD ForI Organism Variant Bacteroides thetaiotaomicron Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0 1 Desulfotela psychrophila gi51246205 gi51246204 gi51246203 gi51246202 1 Halobacterium sp. Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7 2 Deinococcus radiodurans Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04 2 Bacillus subtilis P10944 P25503 P42084 P42068 2 Caulobacter crescentus P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9 3 Pseudomonas putida Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3 3 Xanthomonas campestris Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5 3 Listeria monocytogenes -1 “The Populated Subsystem”

  21. Subsystem Diagram • Three functional variants • Universal subset has three roles, followed by three alternative paths from IV to VI • No ForI known experimentally

  22. Subsystem Spreadsheet HutH HutU HutI GluF HutG NfoD ForI Organism Variant Bacteroides thetaiotaomicron Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0 1 Desulfotela psychrophila gi51246205 gi51246204 gi51246203 gi51246202 1 Halobacterium sp. Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7 2 Deinococcus radiodurans Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04 2 Bacillus subtilis P10944 P25503 P42084 P42068 2 Caulobacter crescentus P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9 3 Pseudomonas putida Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3 3 Xanthomonas campestris Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5 3 Listeria monocytogenes -1 Subsystem Spreadsheet • Prediction from subsystems confirmed experimentally

  23. Outline • Sequencing statistics scare skeptics • The SEED database • Some simply stunning Subsystems • Mysterious missing methionine metabolism • Marine metabolism mined from metagenomics • Fabulous four-five-four for facile functional findings • Marine phage most puzzling

  24. sulfur and acetylhomoserine sulfhydralase acquire met or convert homocysteine to methionine convert cystathione to homocysteine convert cysteine to cystathione acquire homoserine How do bacteria make methionine?

  25. ? Missing genes ?

  26. Cyanoseed: http://cyanoseed.theFIG.info

  27. Marineseed: http://theseed.uchicago.edu/FIG/organisms.cgi?show=marine

  28. genome context (virulence islands, prophages, conserved gene clusters) virulence mechanism enzymatic activity cellular localization predicted or measured co-regulation common phenotype combinations of criteria Subsystems are not just for gene clusters

  29. How much progress has been made? • 541 subsystems encoded • 80 – 85% of the genes in core machinery are contained in subsystems • 30 – 35% of genes in NMPDR organism genomes, • 20 – 30% of other genomes contained in subsystems

  30. Outline • Sequencing statistics scare skeptics • The SEED database • Some simply stunning Subsystems • Mysterious missing methionine metabolism • Marine metabolism mined from metagenomics • Fabulous four-five-four for facile functional findings • Marine phage most puzzling

  31. Metagenomics 200 liters water 5-500 g fresh fecal matter Concentrate and purify viruses Epifluorescent Microscopy Extract nucleic acids DNA/RNA LASL Sequence Breitbart et al., multiple papers

  32. Control datasets for metagenome comparisons Number of proteins in different datasets

  33. Subsystems per million CDS

  34. Determination of Statistical DifferencesBetween Metagenomes • Take 10,000 proteins from sample 1 • Count frequency of each subsystem • Repeat 20,000 times • Repeat for sample 2 • Combine both samples • Sample 10,000 proteins 20,000 times • Build 95% CI • Compare medians from samples 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC Bioinformatics

  35. Sampling Sargasso and “SEED” metagenomes

  36. Comparison of all Subsystems More in Sargasso More in SEED

  37. Is serine being used as an osmolyte? • Few trehalose, proline, sucrose synthetic genes • Serine is most abundant amino acid in ocean (Suttle, Keil) • Serine is more effective osmoprotectant than glycine betaine • (Yancey)

  38. Outline • Sequencing statistics scare skeptics • The SEED database • Some simply stunning Subsystems • Mysterious missing methionine metabolism • Marine metabolism mined from metagenomics • Fabulous four-five-four for facile functional findings • Marine phage most puzzling

  39. So 2004 454 Metagenomics 200 liters water 5-500 g fresh fecal matter Concentrate and purify viruses Epifluorescent Microscopy Extract nucleic acids DNA/RNA LASL Sequence Breitbart et al., multiple papers

  40. 454 Sequence Data(Only from Rohwer Lab, in one year) • 42 libraries • 22 microbial, 20 phage • 1,028,563,420 bp total • 33% of the human genome • 95% of all complete and partial bacterial genomes • 10% of community sequencing of JGI per year • 9,933,184 sequences • Average 236,511 per library • Average read length 103.5 bp • Av. read length has not increased in 12 months

  41. The Soudan Mine, Minnesota Red Stuff Oxidized Black Stuff Reduced

  42. Red and Black Samples Are Different Black stuff Cloned and 454 sequenced 16S are indistinguishable Cloned Red Red

  43. There are different amounts of metabolism in each environment

  44. There are different amounts ofsubstrates in each environment Red Stuff Black Stuff

  45. But are the differences significant? • Sample 10,000 proteins from site 1 • Count frequency of each “subsystem” • Repeat 20,000 times • Repeat for sample 2 • Combine both samples • Sample 10,000 proteins 20,000 times • Build 95% CI • Compare medians from sites 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC Bioinformatics

  46. Subsystem differences & metabolismIron acquisition Black Stuff Siderophore enterobactin biosynthesis ferric enterobactin transport ABC transporter ferrichrome ABC transporter heme Black stuff: ferrous iron (Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8]) Red stuff: ferric iron (goethite [FeO(OH)])

  47. Nitrification differentiates the samples Edwards (2006) BMC Genomics

  48. Red Sample Arg, Trp, His Ubiquinone FA oxidation Chemotaxis, Flagella Methylglyoxal metabolism Black Sample Ile, Leu, Val Siderophores Glycerolipids NiFe hydrogenase Phenylpropionate degradation The challenge is explaining the differences between samples

  49. We can cheaply compare the important biochemistry happening in different environments We don’t care which organisms are doing the metabolism but we know what organisms are there

More Related