1 / 32

Rob Edwards phage.sdsu/~rob San Diego State University

SGM Meeting, Warwick, April 2006. Challenges for metagenomic data analysis and lessons from viral metagenomes [What would you do if sequencing were free?]. Rob Edwards http://phage.sdsu.edu/~rob San Diego State University Fellowship for Interpretation of Genomes. Outline.

kaelem
Download Presentation

Rob Edwards phage.sdsu/~rob San Diego State University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SGM Meeting, Warwick, April 2006 Challenges for metagenomic data analysis and lessons from viral metagenomes[What would you do if sequencing were free?] Rob Edwards http://phage.sdsu.edu/~rob San Diego State University Fellowship for Interpretation of Genomes

  2. Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?

  3. This is all 454 sequence data • 21 libraries • 10 microbial, 11 phage • 597,340,328 bp total • 20% of the human genome • 50% of all complete and partial microbial genomes • 5,769,035 sequences • Average 274,716 per library • Average read length 103.5 bp • Av. read length has not increased in 7 months • Cost 0.04¢ per bp

  4. Sequencing is cheap and easy. Bioinformatics is neither.

  5. The Soudan Mine, Minnesota Red Stuff Oxidized Black Stuff Reduced

  6. Red and Black Samples Are Different Black stuff Cloned and 454 sequenced 16S are indistinguishable Cloned Red Red

  7. There are different amounts of metabolism in each environment

  8. There are different amounts ofsubstrates in each environment Red Stuff Black Stuff

  9. But are the differences significant? • Sample 10,000 proteins from site 1 • Count frequency of each “subsystem” • Repeat 20,000 times • Repeat for sample 2 • Combine both samples • Sample 10,000 proteins 20,000 times • Build 95% CI • Compare medians from sites 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC Bioinformatics

  10. Subsystem differences & metabolismIron acquisition Black Stuff Siderophore enterobactin biosynthesis ferric enterobactin transport ABC transporter ferrichrome ABC transporter heme Black stuff: ferrous iron (Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8]) Red stuff: ferric iron (goethite [FeO(OH)])

  11. Nitrification differentiates the samples Edwards (2006) BMC Genomics

  12. Red Sample Arg, Trp, His Ubiquinone FA oxidation Chemotaxis, Flagella Methylglyoxal metabolism Black Sample Ile, Leu, Val Siderophores Glycerolipids NiFe hydrogenase Phenylpropionate degradation The challenge is explaining the differences between samples

  13. We can cheaply compare the important biochemistry happening in different environments We don’t care which organisms are doing the metabolism but we know what organisms are there

  14. Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?

  15. Why Phages? • Phages are viruses that infect bacteria • 10:1 ratio of phages:bacteria • 1031 phages on the planet • Specific interactions (probably) • one virus : one host • Small genome size • Higher coverage • Horizontal gene transfer • 1025-1028 bp DNA per year in the oceans • Can’t do fosmids

  16. ARC 56 samples 16 sites 1 year BBC 85 samples 38 sites 8 years SAR 1 sample 1 site 1 year GOM 41 samples 13 sites 5 years LI 4 sites 1 year Phages In The Worlds Oceans

  17. Most Marine Phage Sequences are Novel

  18. ssDNA -like T4-like T7-like Phages are specific to environments Phage Proteomic Tree v. 5 (Edwards, Rohwer) Thanks: Mya Breitbart

  19. Marine Single-Stranded DNA Viruses • 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae) • 40% viral particles in SAR are ssDNA phage • Several full-genome sequences were recovered via de novo assembly of these fragments • Confirmed by PCR and sequencing

  20. SAR Aligned Against the Chlamydia 4 Individual sequence reads Coverage Concatenated hits Chlamydia phi 4 genome 12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome

  21. Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?

  22. Phages, Reefs, and Human Disturbance

  23. Kingman Christmas Phages, Reefs, and Human Disturbance Kingman Palmyra Washington Fanning Christmas The Northern Line Islands Expedition, 2005

  24. More photosynthesis at Kingman. No people at Kingman. More pathogens at Christmas. More people at Christmas. Christmas to Kingman Bias in No. Phage Hosts Negative numbers mean relatively more phage hosts at Kingman

  25. Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?

  26. Phages enrich for important genes • Rios Mesquites Stromatolites • No photosynthesis genes in phages • Pozas Azules Stromatolites • 5 different photosynthesis genes in phages

  27. RNR is the most successful reaction in evolution

  28. Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?

  29. Computational Challenges • Sequence annotations and analysis • What is there? • What is it doing? • How is it doing it? • Gene predictions in unknowns • Lutz Krause (Bielefeld) • Sequence comparisons • BLAST • Other ways to rapidly compare short sequences • What happens when everyone is using 454 sequencing?

  30. Sequence data from 21 libraries 600 million bp 6 million sequences • Each BLASTX search takes 1,000 CPU hours • 21 libraries = 21,000 CPU hours or 2.4 CPU years • Users want • repeat runs, • TBLASTX, • more analysis • more data • more, more, more, more

  31. SDSU Forest Rohwer Beltran Rodriguez-Brito USF Mya Breitbart Rohwer Lab Linda Wegley Florent Angly Matt Haynes Stromatolites Janet Seifert Rice University) Valeria Souza (UNAM, Mexico) ANL Rick Stevens Bob Olsen CI Support FIG Veronika Vonstein Ross Overbeek Annotators Also at SDSU Anca Segall Stanley Maloy Math Guys@SDSU Peter Salamon Joe Mahaffy James Nulton Ben Felts David Bangor Steve Rayhawk Jennifer Mueller UBC Curtis Suttle Amy Chan MIT: Ed DeLong

More Related