390 likes | 411 Views
Microbial Genomics. Nikos Kyrpides Genome Biology Program (GBP) DOE Joint Genome institute. 4319 Genome Projects. http://www.genomesonline.org/. UNDERSTANDING vs INFORMATION. Do we really need more sequencing?. Genome projects 2000. Genome projects 2008. 11%. 71 bacterial genomes.
E N D
Microbial Genomics Nikos Kyrpides Genome Biology Program (GBP) DOE Joint Genome institute
4319 Genome Projects http://www.genomesonline.org/
UNDERSTANDING vs INFORMATION Do we really need more sequencing?
Genome projects 2000 Genome projects 2008 11% 71 bacterial genomes 2289 bacterial genomes Not much has changed…
Many gaps Poor sequence coverage mainly due to lack of isolates, but many gaps have unsequenced representatives
99% of microorganisms are not cultured with present methods. Culturable Unculturable the Uncultured majority
Major Transitions in the2nd Decade in Genomics Transition 1 Genomes Metagenomes
Genomics Metagenomics Sargasso sea Acid Mine Drainage Tyson et al. Nature. 2004 Mar 4;428(6978):37-43. Venter et al. Science. 2004 Apr 2;304(5667):66-74.
Species Complexity Binning ? Soil Sargasso Sea Termite Hindgut Human Gut Acid Mine Drainage Species complexity 1 10 100 1000 10000
Major Transitions in the2nd Decade in Genomics Transition 1 Genomes Metagenomes Transition 2 Individual Genome Projects Large Scale Projects
Reference Genomes Individual Genome Projects Large Scale Projects
GEBA 2007 255 Genome Projects ~ 60 Finished Coordination Group Jonathan Eisen Phil Hugenholt Hans-Peter Klenk Nikos Kyrpides Hans-Peter Klenk
Novel Cellulolytic enzymes Non-reducing end Reducing end cellulose Endoglucanases Processive exoglucanase (reducing end) Processive exoglucanase (non-reducing end) Processive endoglucanase cellotetraose cellopentaose cellobiose b-glucosidase Glc Glc * Only 1 other enzyme of this family is known Complete cellulase systems of different compositions found in 4 genomes: • Xylanimonas cellulosilytica ( known cellulose degrader, proteomics data in multiple conditions to come) • Cellulomonas flavigena (known cellulose degrader, proteomics data in multiple conditions to come) • Catenulispora acidiphila (not known to degrade cellulose, collaborators are doing experiments) • Streptosporangium roseum (not known to degrade cellulose)
Major Transitions in the2nd Decade in Genomics Transition 1 Genomes Metagenomes Transition 2 Individual Genome Projects Large Scale Projects Transition 3 Populations Single Cells
GREAT CHALLENGES P. Chain et al. Science, 2009
Human Gut: ~ 1000 Microbes 3 Million Genes GREAT CHALLENGES Metagenomics – Environmental Genomics
ENGINEERING CONCEPTUAL SOLUTIONS Where do we go from here
GREAT CHALLENGES DATA PROCESSING DATA COMPARISON DATA BROWSING • Need better ways • to represent multiple genomes • to store and present data • to compute similarities • to represent an organism Transition 4 1000s Genomes Pangenomes
10 Prochlorococcus marinus Pangenome 17 Listeria monocytogenes Pangenome Staphylococcus aureus Pangenome 15
14765 2733 = 5.4 10434 5820 = 1.8
PARADIGM SHIFT 1960-1990 1990-2010 16S RNA 2010-2020 Genomes Pangenomes
Genomic Standards Consortium Dawn Field Metadata • Habitat • DNA Source • Isolation • Phenotype
GSC-8 | DOE-JGI | Sept 9-11, 2009 http://gensc.org/ Genomic Standards Consortium
Type Strains: 27% Type Strains: 20% Culture Collections 35% 53% http://www.genomesonline.org/
http://standardsingenomics.org/ July 20, 2009 SIGS is an open-access, standards-supportive publication that seeks to rapidly disseminates concise genome and metagenome reports in compliance with MIGS/MIMS standards. SIGS also seeks to present detailed standard operating procedures, meeting reports, reviews and commentaries, data policies, white papers and other gray literature that is relevant to genome sciences, but absent from the scholarly literature. George Garrity
Genomic Standards Consortium Data Processing • Sequencing • Finishing • Assembly • Gene Finding Metadata • Habitat • DNA Source • Isolation • Phenotype Patrick Chain
Conclusions • Microbial diversity remains largely uncovered • The vast majority of currently ongoing genome projects do not cover novel grounds • To understand an organism we need to sequence a reasonable number of closely related strains • We need Standards
Global Genome Census for Microbes
CENSUS TARGETS • Sequence at least one representative from every characterized microbial Genus • Sequence at least one representative from every characterized microbial species • Sequence sufficient number of strains to characterize each species and generate the species pangenome • Understand the effects of geographic distribution on species dynamics
KEY PARTNERS • GSC • CULTURE COLLECTION CENTERS • DSMZ [Hans-Peter Klenk] • REPRESENTATIVES FROM GRAND CHALLENGE PROJECTS • GEBA [Phil Hugenholtz, Jonathan Eisen] • TERRAGENOME [Janet Jansson] • HMP [George Weinstock, Karen Nelson] • KEY CORE PARTICIPANTS • Large Sequencing Centers • Country members
Building a roadmap for a scaleable and sustainable computing MetaInfrastructure for the metagenomics community GSC Biocomputing Consortium Folker Meyer innovation through collaboration
MEGA Microbial Environmental Genomics Administration