190 likes | 222 Views
C A M E R A A Metagenomics Resource for Marine Microbial Ecology. July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute. Acknowledgements. UCSD/Calit2 Larry Smarr, PI; Paul Gilna, Executive Director Phil Papadopoulos, Technical Lead Weizhong Li JCVI
E N D
C A M E R AA Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul GilnaUCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute
Acknowledgements • UCSD/Calit2 • Larry Smarr, PI; Paul Gilna, Executive Director • Phil Papadopoulos, Technical Lead • Weizhong Li • JCVI • Marv Frazier, co-PI • Leonid Kagan, Architect; Jennifer Wortman, Bioinformatics • Rekha Seshadri, Outreach and Training; • Doug Rusch, Shibu Yooseph, Aaron Halpern, Granger Sutton • UC Davis • Jonathan Eisen, co-investigator • Gordon and Betty Moore Foundation • David Kingsbury and Mary Maxon
Outline • New Discipline of Metagenomics • Global Ocean Sampling Expedition • Challenges of Metagenomic Data • CAMERA Features • CAMERA Usage to Date • Cyberinfrastructure
Genomics vs Metagenomics • Genomics – ‘Old School’ • Study of an organism's genome • Genome sequence determined using shotgun sequencing and assembly • ~1300 microbes sequenced, first in 1995 • DNA usually obtained from pure cultures • Metagenomics • Application of genome sequencing methods to environmental samples (no culturing) • Environmental shotgun sequencing is the most widely used approach
Metagenomic Questions • Within an environment • What biological functions are present (absent)? • What organisms are present (absent) • Compare data from (dis)similar environments • What are the fundamental rules of microbial ecology • Search for novel proteins and protein families
Metagenomics Applications • Marine Ecology and Microbiology • Alternative Energy and Industrial • Hypersaline ponds, Oceans • Termite Metabolism • Medical Applications • Microbial Ecology of Human body cavities and fluids • Agricultural • Disease Vector Metabolism (Glassy Eyed Sharpshooter) • Soil Ecology • Environmental Remediation • DOE: Acid Mine Drainage, Chemical and Radioactive Waste
Metadata • Metagenomics • Genomics + Metadata • Environmental Metadata • Time andlocation (lat, long, depth) of sample collection - Correlate w/remote sensing data • Physico-chemical properties (e.g. temperature, salinity) MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003
JCVI Global Ocean SamplingExpeditionLargest Metagenomic Study to Date
Global Ocean Sampling (GOS) 178 Total Sampling Locations Phase 1: 41 samples, 7.7M reads, >6M proteins Diverse Environments Open ocean, estuary, embayment, upwelling, fringing reef, atoll, warm seep, mangrove, fresh water, biofilms, sediments, soils
GOS Protein AnalysisYooseph et al (PLoS 2007) • Novel clustering process • Sequence similarity based • Predict proteins and group into related clusters • Include GOS and all known proteins • Findings • GOS proteins cover ~all existing prokaryotic families • GOS expands diversity of known protein families • 1700 large novel clusters with no homology to known protein families • Higher than expected proportion of novel clusters are viral • No saturation in the rate of novel protein family discover
GOS eukaryotes GOS prokaryotes GOS viral Known eukaryotes Known prokaryotes Known viral H. marismortui D. radiodurans D. psychrophila B. halodurans T. thermophilus B. anthracis Added Diversity Rubisco homologs UVDE homologs GOS prokaryotes Known eukaryotes Known prokaryotes
Fragment Recruitment ViewerRusch et al, PLoS 3/2007 Sequence absent from most strains – phage/other lateral transfer? 100% 100% Percent Identity 50% “core” genome, ~75% identical Ribosomal operon 55% Reference Genome Coordinates
Why CAMERA? • Public repositories not focused on environmental metagenomics • Sargasso Sea data underutilized by community • M$ invested in sequencing and analysis but only accessible to bioinformatics elite • Release of GOS dataset in March 2007 • Comply with Convention on Biodiversity
CAMERA – http://camera.calit2.net • “Convenient acronym for cumbersome name…” • Henry Nichols, PLoS Biology • Mission • Enable Research in Marine Microbiology • CAMERA Partners:
Challenges • Enormous datasets with high gene density • large compute resources required • 2 orders of magnitude jump • Fragmentary data • inadequate bioinformatics tools for assembly, annotation, analysis, visualization • Metadata standards non-existent • metadata absent from databases • Lack of standards impedes collection of datasets • Diversity of User Sophistication and Needs
CAMERA Services • Maintain searchable sequence collections • ALL metagenomic sequence reads, assemblies • Non-identical amino acid collection (extended NRAA) • Viral, Fungal, pico-Eukaryotes, Microbial • CAMERA protein clusters • Metagenomics data easily downloadable • Interactive and Batch Search Facility • Scalable parallel implementations of BLAST • Integrated with associated metadata
Distinctive Features Set in Progress • Graphical Tools for Visualizing Diversity • Based on Rusch et al • Fragment recruitment viewer • CAMERA Protein Clusters • Based on Yooseph et al • Incremental version implemented in 2007 • Annotation • Break through quadratic complexity via clusters • Phyletic Classification • Overviews of sequence collections
Fragment Recruitment Viewer Metagenomic Sequence vs Reference Sequence • Highlight and Select with Associated Metadata • View large datasets • AJAX I/F Based on Doug Rusch’s Viewer