240 likes | 432 Views
The Sorcerer II Global ocean sampling expedition. Background. Microorganisms in the world oceans: what do we know? Play an important role in the marine ecosystem and global biogeochemical cycles. 10 million species? How can second generation sequencing techniques contribute?. Craig Venter.
E N D
Background • Microorganisms in the world oceans: what do we know? • Play an important role in the marine ecosystem and global biogeochemical cycles. • 10 million species? • How can second generation sequencing techniques contribute?
Craig Venter • Human genome • Ocean sampling (GOS) • Synthetic biology • Modified microorganisms
Global Ocean Sampling • The expedition’s goal is to evaluate the microbial diversity in the world’s oceans using the tools and teqhniques developed to sequence the human and other organisms’ genomes. • They want to increase the knowledge about microbial diversity and expect that this will help them understand how ecosystems function and to discover new genes of ecological and evolutionary importance
Sampling • 200-400 liters of water every 200 miles • Filtering
Methods • Total DNA was extracted (0.1-0.8 µm) • Random insert clone libraries • End-sequencing of 44 000-420 000 clones per sample (Sanger sequencing) • 6 gigabases (billion bases) sequenced
Development of new tools • Fragment recruitment analyses for performing and visualizing comparative genomic analysis when a reference sequence is available. • New assembly techniques that use metadata to produce assemblies for uncultivated microbial taxa. • A whole metagenome comparison tool to compare entire samples at arbitrary degree of genetic divergence.
Assembly • Primary assembly: Celera assembler • Pairs of mated reads were testet- overlap- single pseudo-read • Overlap cut-off 98 % to construct unitigs • Fragmented • Second assembly: 94 % cut-off • Series of assemblies at various stringencies for subsets of GOS-data
Fragment recruitment • GOS dataset compared with genomes of sequenced microbes (NCBI)- 584 reference genomes • BLAST- 55 % identity • 70 % of the reads aligned to one or more genomes. • Many with large gaps and low identity • Recruited reads: stringent criteria- 30 % of the reads
Fragment recruitment • All genome structure variations that are large enough to prevent recruitment can be detected → will be associated with missing mates. • Depending on the type of rearrangement present, other recruitment metadata categories will be present near the rearrangements’ endpoint → possible to distinguish among deletions, translocations, inversions and inverted translocations from the recruitment plots.
Extreme assembly of uncultivated populations • Assemblies for abundant, uncultivated microbial genera • Assembly apporach that resolves conflict – ”Extreme assembly” • Do not use matepairing data – contigs • Assembly artefacts • Alternative way to an unguided assembly: start from seed fragments that can be identified as belonging to a particular taxonomic group.
Fragment recruitment plots • Investigate variation within a group of related organisms • Repeatedly seeding extreme assembly with fragments mated to a SAR11 like 16S sequence.
Sample comparisons • A method that assess the genetic similarity between two samples that potentially make use of all portions of the genome, not just the 16S rRNA region. • Assembly independent • Estimate of the fraction of sequence from one sample that could be considered to be present in the other sample. • Whole metagenomic similarities were computed for all pairs of samples.
Variations in gene abundance • Differences in gene content between samples • Can identify functions that reflect the lifestyles of the community in the context of its local environment. • Binning of genes into functional categories – TIGERFRAM hidden Markov models. • Genes predominately found in a single sample. • Differences between temperate/tropical samples • Differences between samples with almost similar taxonomy
CAMERA • Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis • http://camera.calit2.net/ • A need for a systematic way to explore the structure and function of ocean ecosystems, and their impact on global carbon processing and climate. – Bridge the gap between the rates of collecting data and interpreting it. • Monitoring microbial communities in the ocean and their response to environmental changes.
Metadata • CAMERA will integrate sequence data with all available metadata • Allow researchers to derive correlations between ecology and environmental conditions that may favour one community structure or another. • Future…. Metadata from satelites and weather stations can be used to help interpret and inform us on how these factors affect microbial processes as well as community composition.
New generation Bioinformatics tools • Combine bioinformatical tools with large-scale compute resources