320 likes | 518 Views
SGM Meeting, Warwick, April 2006. Challenges for metagenomic data analysis and lessons from viral metagenomes [What would you do if sequencing were free?]. Rob Edwards http://phage.sdsu.edu/~rob San Diego State University Fellowship for Interpretation of Genomes. Outline.
E N D
SGM Meeting, Warwick, April 2006 Challenges for metagenomic data analysis and lessons from viral metagenomes[What would you do if sequencing were free?] Rob Edwards http://phage.sdsu.edu/~rob San Diego State University Fellowship for Interpretation of Genomes
Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?
This is all 454 sequence data • 21 libraries • 10 microbial, 11 phage • 597,340,328 bp total • 20% of the human genome • 50% of all complete and partial microbial genomes • 5,769,035 sequences • Average 274,716 per library • Average read length 103.5 bp • Av. read length has not increased in 7 months • Cost 0.04¢ per bp
Sequencing is cheap and easy. Bioinformatics is neither.
The Soudan Mine, Minnesota Red Stuff Oxidized Black Stuff Reduced
Red and Black Samples Are Different Black stuff Cloned and 454 sequenced 16S are indistinguishable Cloned Red Red
There are different amounts of metabolism in each environment
There are different amounts ofsubstrates in each environment Red Stuff Black Stuff
But are the differences significant? • Sample 10,000 proteins from site 1 • Count frequency of each “subsystem” • Repeat 20,000 times • Repeat for sample 2 • Combine both samples • Sample 10,000 proteins 20,000 times • Build 95% CI • Compare medians from sites 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC Bioinformatics
Subsystem differences & metabolismIron acquisition Black Stuff Siderophore enterobactin biosynthesis ferric enterobactin transport ABC transporter ferrichrome ABC transporter heme Black stuff: ferrous iron (Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8]) Red stuff: ferric iron (goethite [FeO(OH)])
Nitrification differentiates the samples Edwards (2006) BMC Genomics
Red Sample Arg, Trp, His Ubiquinone FA oxidation Chemotaxis, Flagella Methylglyoxal metabolism Black Sample Ile, Leu, Val Siderophores Glycerolipids NiFe hydrogenase Phenylpropionate degradation The challenge is explaining the differences between samples
We can cheaply compare the important biochemistry happening in different environments We don’t care which organisms are doing the metabolism but we know what organisms are there
Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?
Why Phages? • Phages are viruses that infect bacteria • 10:1 ratio of phages:bacteria • 1031 phages on the planet • Specific interactions (probably) • one virus : one host • Small genome size • Higher coverage • Horizontal gene transfer • 1025-1028 bp DNA per year in the oceans • Can’t do fosmids
ARC 56 samples 16 sites 1 year BBC 85 samples 38 sites 8 years SAR 1 sample 1 site 1 year GOM 41 samples 13 sites 5 years LI 4 sites 1 year Phages In The Worlds Oceans
ssDNA -like T4-like T7-like Phages are specific to environments Phage Proteomic Tree v. 5 (Edwards, Rohwer) Thanks: Mya Breitbart
Marine Single-Stranded DNA Viruses • 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae) • 40% viral particles in SAR are ssDNA phage • Several full-genome sequences were recovered via de novo assembly of these fragments • Confirmed by PCR and sequencing
SAR Aligned Against the Chlamydia 4 Individual sequence reads Coverage Concatenated hits Chlamydia phi 4 genome 12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome
Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?
Kingman Christmas Phages, Reefs, and Human Disturbance Kingman Palmyra Washington Fanning Christmas The Northern Line Islands Expedition, 2005
More photosynthesis at Kingman. No people at Kingman. More pathogens at Christmas. More people at Christmas. Christmas to Kingman Bias in No. Phage Hosts Negative numbers mean relatively more phage hosts at Kingman
Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?
Phages enrich for important genes • Rios Mesquites Stromatolites • No photosynthesis genes in phages • Pozas Azules Stromatolites • 5 different photosynthesis genes in phages
Outline • The envy is not mine • A tour around the world, thanks to phage • People suck • What is the most successful gene in evolution? • Is there a Future?
Computational Challenges • Sequence annotations and analysis • What is there? • What is it doing? • How is it doing it? • Gene predictions in unknowns • Lutz Krause (Bielefeld) • Sequence comparisons • BLAST • Other ways to rapidly compare short sequences • What happens when everyone is using 454 sequencing?
Sequence data from 21 libraries 600 million bp 6 million sequences • Each BLASTX search takes 1,000 CPU hours • 21 libraries = 21,000 CPU hours or 2.4 CPU years • Users want • repeat runs, • TBLASTX, • more analysis • more data • more, more, more, more
SDSU Forest Rohwer Beltran Rodriguez-Brito USF Mya Breitbart Rohwer Lab Linda Wegley Florent Angly Matt Haynes Stromatolites Janet Seifert Rice University) Valeria Souza (UNAM, Mexico) ANL Rick Stevens Bob Olsen CI Support FIG Veronika Vonstein Ross Overbeek Annotators Also at SDSU Anca Segall Stanley Maloy Math Guys@SDSU Peter Salamon Joe Mahaffy James Nulton Ben Felts David Bangor Steve Rayhawk Jennifer Mueller UBC Curtis Suttle Amy Chan MIT: Ed DeLong