1 / 19

C A M E R A A Metagenomics Resource for Marine Microbial Ecology

C A M E R A A Metagenomics Resource for Marine Microbial Ecology. July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute. Acknowledgements. UCSD/Calit2 Larry Smarr, PI; Paul Gilna, Executive Director Phil Papadopoulos, Technical Lead Weizhong Li JCVI

heidie
Download Presentation

C A M E R A A Metagenomics Resource for Marine Microbial Ecology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C A M E R AA Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul GilnaUCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute

  2. Acknowledgements • UCSD/Calit2 • Larry Smarr, PI; Paul Gilna, Executive Director • Phil Papadopoulos, Technical Lead • Weizhong Li • JCVI • Marv Frazier, co-PI • Leonid Kagan, Architect; Jennifer Wortman, Bioinformatics • Rekha Seshadri, Outreach and Training; • Doug Rusch, Shibu Yooseph, Aaron Halpern, Granger Sutton • UC Davis • Jonathan Eisen, co-investigator • Gordon and Betty Moore Foundation • David Kingsbury and Mary Maxon

  3. Outline • New Discipline of Metagenomics • Global Ocean Sampling Expedition • Challenges of Metagenomic Data • CAMERA Features • CAMERA Usage to Date • Cyberinfrastructure

  4. Genomics vs Metagenomics • Genomics – ‘Old School’ • Study of an organism's genome • Genome sequence determined using shotgun sequencing and assembly • ~1300 microbes sequenced, first in 1995 • DNA usually obtained from pure cultures • Metagenomics • Application of genome sequencing methods to environmental samples (no culturing) • Environmental shotgun sequencing is the most widely used approach

  5. Metagenomic Questions • Within an environment • What biological functions are present (absent)? • What organisms are present (absent) • Compare data from (dis)similar environments • What are the fundamental rules of microbial ecology • Search for novel proteins and protein families

  6. Metagenomics Applications • Marine Ecology and Microbiology • Alternative Energy and Industrial • Hypersaline ponds, Oceans • Termite Metabolism • Medical Applications • Microbial Ecology of Human body cavities and fluids • Agricultural • Disease Vector Metabolism (Glassy Eyed Sharpshooter) • Soil Ecology • Environmental Remediation • DOE: Acid Mine Drainage, Chemical and Radioactive Waste

  7. Metadata • Metagenomics • Genomics + Metadata • Environmental Metadata • Time andlocation (lat, long, depth) of sample collection - Correlate w/remote sensing data • Physico-chemical properties (e.g. temperature, salinity) MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003

  8. JCVI Global Ocean SamplingExpeditionLargest Metagenomic Study to Date

  9. Global Ocean Sampling (GOS) 178 Total Sampling Locations Phase 1: 41 samples, 7.7M reads, >6M proteins Diverse Environments Open ocean, estuary, embayment, upwelling, fringing reef, atoll, warm seep, mangrove, fresh water, biofilms, sediments, soils

  10. GOS Protein AnalysisYooseph et al (PLoS 2007) • Novel clustering process • Sequence similarity based • Predict proteins and group into related clusters • Include GOS and all known proteins • Findings • GOS proteins cover ~all existing prokaryotic families • GOS expands diversity of known protein families • 1700 large novel clusters with no homology to known protein families • Higher than expected proportion of novel clusters are viral • No saturation in the rate of novel protein family discover

  11. GOS eukaryotes GOS prokaryotes GOS viral Known eukaryotes Known prokaryotes Known viral H. marismortui D. radiodurans D. psychrophila B. halodurans T. thermophilus B. anthracis Added Diversity Rubisco homologs UVDE homologs GOS prokaryotes Known eukaryotes Known prokaryotes

  12. Rate of Protein Discovery

  13. Fragment Recruitment ViewerRusch et al, PLoS 3/2007 Sequence absent from most strains – phage/other lateral transfer? 100% 100% Percent Identity 50% “core” genome, ~75% identical Ribosomal operon 55% Reference Genome Coordinates

  14. Why CAMERA? • Public repositories not focused on environmental metagenomics • Sargasso Sea data underutilized by community • M$ invested in sequencing and analysis but only accessible to bioinformatics elite • Release of GOS dataset in March 2007 • Comply with Convention on Biodiversity

  15. CAMERA – http://camera.calit2.net • “Convenient acronym for cumbersome name…” • Henry Nichols, PLoS Biology • Mission • Enable Research in Marine Microbiology • CAMERA Partners:

  16. Challenges • Enormous datasets with high gene density • large compute resources required • 2 orders of magnitude jump • Fragmentary data • inadequate bioinformatics tools for assembly, annotation, analysis, visualization • Metadata standards non-existent • metadata absent from databases • Lack of standards impedes collection of datasets • Diversity of User Sophistication and Needs

  17. CAMERA Services • Maintain searchable sequence collections • ALL metagenomic sequence reads, assemblies • Non-identical amino acid collection (extended NRAA) • Viral, Fungal, pico-Eukaryotes, Microbial • CAMERA protein clusters • Metagenomics data easily downloadable • Interactive and Batch Search Facility • Scalable parallel implementations of BLAST • Integrated with associated metadata

  18. Distinctive Features Set in Progress • Graphical Tools for Visualizing Diversity • Based on Rusch et al • Fragment recruitment viewer • CAMERA Protein Clusters • Based on Yooseph et al • Incremental version implemented in 2007 • Annotation • Break through quadratic complexity via clusters • Phyletic Classification • Overviews of sequence collections

  19. Fragment Recruitment Viewer Metagenomic Sequence vs Reference Sequence • Highlight and Select with Associated Metadata • View large datasets • AJAX I/F Based on Doug Rusch’s Viewer

More Related