530 likes | 682 Views
Phenbank. George M. Garrity Microbiology and Molecular Genetics & Bergey’s Manual Trust Michigan State University. GenBank DDBJ/EMBL. Limited data types Universally applicable across all taxa Cumulative Controlled vocabulary Links to primary literature Suite of robust tools
E N D
Phenbank George M. Garrity Microbiology and Molecular Genetics & Bergey’s Manual Trust Michigan State University
GenBank • DDBJ/EMBL • Limited data types • Universally applicable across all taxa • Cumulative • Controlled vocabulary • Links to primary literature • Suite of robust tools • Strong public support • Large user base • Funding • Data curation weak • Taxonomic • data sources • Unlimited data types • Some broadly applicable across all taxa, most are not • Some are cumulative, many are comparative • Numerous taxon specific vocabularies • Few links to primary literature or original data sets • Tools of variable quality, most are “one-off” • Limited public support • User bases vary with economic importance • Funding poor to non-existant • Data curation variable
Goals of presentation • Doesn’t exist (yet) • What it might provide to the community? • Share some thoughts on what it might take to create such a resource • Will borrow heavily on Field and Hughes commentary in Microbiology • Technical issues • What data and information currently exists • What’s on the horizon • Sociological issues • Importance of the primary literature • Data provenance and other hurdles • Data curation • Self-supporting vs public funding
Four synergistic projects at MSU Compilation and publication of the principle monographic work in prokaryotic biology The major source of curated 16S rRNA sequences and on-line tools used in building prokaryotic phylogenies and identifying cultivated and yet to be cultivated prokaryotes Visualization tools for exploratory data analysis of large sequence data set, a taxonomic atlas of the prokaryotes, and a repository of vetted 16S sequences Semantic resolution services for life sciences using digital object identifiers Phenbank ?
Bergey’s Manual of Systematic Bacteriology • Print publication • Five volumes (approximately 6000 pg when completed) • Compilation with contributions by over 800 authors to date • Focus • Predominantly types of validly published named taxa • Organization • Nomenclatural taxonomy following 16S rRNA gene tree • Genus treatments • Etymology, defining publication(s), fourteen major categories (variable) plus sections on enrichment and isolation, maintenance, procedures for special testing, differential features, taxonomic comments, and lists of validly published species, effectively published (invalid), and other organisms, and species incertae sedis • Species lists • Etymology, defining publication(s), key characteristics including culture collection accession numbers, GenBank accessions, and key differential characteristics • Higher taxon treatments • Etymology, defining publication(s), common characteristics and membership
A peak inside the Manual • The Manual is produced electronically • Custom SGML DTD (Lyons, Garrity and Usdin) 600 elements in context Formatting done using FOSI Content in unconstrained English • Manuscript -> tagged instance -> print • Manuscript -> HTML • Key features reported at genus level • Constant content • Enrichment/isolation, maintenance, special methods, taxonomic comments, species lists (valid, invalid, species incertae sedis) • Variable content • Antimicrobial sensitivity, cell morphology, cell wall composition, cultural characteristics, ecology, fine structure, genetics, growth, metabolism, mutants, pathogenesis, physiology, serological reactivity • Extensive linked bibliography, figures, tables
Seeing the gene can fake you out…Ken Nealson, Microbial Environmental Genomics Workshop I
However, • The Manual is not a substitute to the primary literature • Contains information and summarized interpretations of the literature by experts • Does not currently cover • Yet-to-be cultivated taxa (with the exception of Candidatus species) • Environmental sequences • Communities or mixed cultures • Invalidly named and unnamed taxa • With the exception of sequence identifiers, does not provide direct access to raw data • Is static, so changes in taxonomic view cannot be readily conveyed* • So, the Manual provides a good foundation for Phenbank, but much is still missing.
Orphan taxa Streptomyces 544 Clostridium 179 Bacillus 167 Pseudomonas 161 Lactobacillus 136 Mycoplasmafs 120 Mycobacterium 119 Corynebacterium 96 Streptococcus 95 Vibrio 74 10 – 73 species 136 genera 5 – 9 species 163 2 – 4 species 368 Orphans 651
Species frequency by phylum Proteobacteria 2542 2 Actinobacteria 1890 4 Firmicutes 1653 3 Bacteroidetes 362 5 Euryarchaeota 247 1 Spirochaetes 102 5 Cyanobacteria 82 1 Crenarchaeota 51 1 Fusobacteria 42 5 Deinococcus-Thermus 29 1 Thermotogae 26 1 Chlorobi 21 1 Aquificae 20 1 Chlamydiae 17 5 Chloroflexi 14 1 Verrucomicrobia 12 5 Planctomycetes 12 5 Deferribacteres 9 5 Nitrospira 8 1 Thermodesulfobacteria 6 1 Fibrobacteres 3 5 Acidobacteria 3 5 Thermomicrobia 2 1 Dictyoglomi 2 5 Gemmatimonadetes 1 5 Chrysiogenetes 1 1
Wouldn’t it be nice if… • End-user’s • perspective • Biological names were really useful • Would link to… • Relevant literature • Sequences • Other phenotypic data • Sources of strains in Biological Resource Centers • Ancillary materials • Patents • Laws and regulations • Regardless of where the data resides • Without having to know anything about • Synonymies • Orthographic variants • Misapplications of the name
Categories and properties of identifiers • A label that identifies an entity A single unambiguous string • A formal standard or industry convention • Arbitrary • Consistent syntax • Denotes and distinguishes separate members of a class of entities • Establishes a 1:1 correspondece between labels and members • Enumeration • The number or label is simply a string A numbering scheme Adapted from: Paskin, N., (2005) The DOI Handbook Edition 4.2.0
Categories and properties of identifiers • A syntax by which an identifier can be expressed in a form suitable for use within a specific infrastructure. • Actionable identifiers • URI (URN and URL) • ISBN numbers as UPC/EAN identifiers • Does not mandate a method of creating labels • Does not create a managed environment An infrastructure specification
Categories and properties of identifiers • Includes • Unique identifiers, • A formalized infrastructure • Management policies • Examples • UPC/EAN barcodes and RFID tags • Digital object identifiers A system for implementing labels
What’s at DOI ? • DOI syntax • Can include any existing identifier, formal or informal, of any entity • opaque Numbering • Resolve from DOI to data • Initially a location (URL; not persistent) • May be to multiple data including multiple locations, metadata, service. • Extensible • Based on the Handle system • Implement the URI/URN concept • Granularity, scalability, administration, and security Resolution
Authority+ Name+ Taxon Species+ Strain+ Sequence+
GenBank DDBJ EMBL others Collections BRC Literature Governing bodies Authority+ Name+ Taxon Species+ Strain+ Sequence+
Taxon Priority Proposal Source+ Validity Literature Governing bodies STM Synonymy Legal Exemplar req. General Authority+ Databases Name+ Public Private Species+ Strain+ Sequence+ direct Source+ GenBank DDBJ EMBL others Source+ Collections BRC indirect phenotypic phenotypic “omics” BRC
Name+ Name+ Species+ Species+ Strain+ Sequence+ Sequence+ Sequence+ A properly formed species Candidatus or exemplar lost Environmental sequence Name+ “Name”+ Species+ Strain+ Strain* Sequence+ Old type strain, not yet sequenced Misidentifed taxon Name+ Species+ Old type, exemplar based on drawing or description
Differing opinions… Name+ Name+ Name+ Strain+ Strain+ Taxon Taxon Taxon Species+ Sequence+ Sequence+ Strain+ Sequence+ Homotypic synonymy Heterotypic synonymy
Species+ Species+ Species+ Van Landschoot and De Ley 1984 Marinomonas communis Alteromonas communis Oceanospirillum commune Basonym Synonym Bauman et al. 1972 emend. Yi et al. 2004 Marinomonas Alteromonas communis 107 ATCC 27126 DSM 6062 Y18228 Alteromonas Bowditch et al. 1984 Oceanospirillum commune Alteromonas communis Marinomonas communis 107 ATCC 27126 DSM 6062 Y18228 Paired 16S sequence, other data Oceanospirillum Type strain 107 ATCC 27126 DSM 6062 Y18228
Species+ Species+ Species+ Simidu et al., 1990 emend. Nozue et al. 1992 MacDonell and Colwell 1986 Shewanella algae Shewanella alga (corrig.) Shewanella putrifaciens Alteromonas putrifaciens Shewanella OK-1 ACAM 541 ATCC 51192AF005249 IAM 14159 U91546 Hammer 95 ATCC 8071 X82133 DSM 6067 ICPB 352 LMG 2268 NCIB 1047 OK-1 ACAM 541 ATCC 51192 AF005249 IAM 14159 U91546 ICP1 U85903 ACAM 591 U85903 DSM 12253 Shewanella Bowman et al. 1997 Shewanella frigidmarina Shewanella ICP1U85903 ACAM 591U85903 DSM 12253Z
Species+ Species+ Species+ Romanenko et al. 1995 Gauthier et al. 1977 Alteromonas citrea Alteromonas fulginea Alteromonas fulginea Alteromonas citrea Alteromonas Alteromonas ATCC 29719 DSM 6058 NCIMB 188 X82137 CIP 105339 AF529062 KMM 216 AF082563 CIP 105339 AF529062 KMM 216 AF082563 (Gauthier 1977) Gauthier et al. 1995 emend. Ivanova et al. 1998 Pseudoalteromonas citrea Alteromonas citrea Pseudoalteromonas ATCC 29719 DSM 6058 NCIMB 188 X82137 CIP 105339 AF529062 KMM 216 AF082563
1972 Alteromonas macleodii(T) communis vaga
1972 1973 Alteromonas macleodii(T) communis vaga haloplanktis
1972 19731976 Alteromonas macleodii(T) communis vaga haloplanktis rubra
1972 1973 19761977 Alteromonas macleodii(T) communis vaga haloplanktis rubra citrea
1972 1973 1976 19771978 Alteromonas macleodii(T) communis vaga haloplanktis rubra citrea esperjiana undina
1972 1973 1976 1977 19781979 Alteromonas macleodii(T) communis vaga haloplanktis rubra citrea esperjiana undina aurantia
1972 1973 1976 1977 1978 19791981 Alteromonas macleodii(T) communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai
1972 1973 1976 1977 1978 1979 19811982 Alteromonas macleodii(T) communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae
Oceanosprillum Marinomonas linum(T) communis(T) japonicum minutium biejerinckii maris maris maris williamsae hiroshimense multiglobiferum pelagicum pusillum jannaschii kreigii 1972 1973 1976 1977 1978 1979 1981 19821984 Alteromonas macleodii(T) vaga communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae commune vagum • Nomenclatural issues • Homotypic synonymy • Priority • Rule 37(a) 1 • Data issues • One to many relationship • Taxonomic issue • Which one is right?
Shewanella putrifaciens(T) 1972 1973 1976 1977 1978 1979 1981 1982 19841986 Oceanosprillum Marinomonas Alteromonas linum(T) communis(T) macleodii(T) japonicum vaga communis benthica minutium hanedai vaga biejerinckii haloplanktis maris maris rubra citrea maris williamsae esperjiana undina hiroshimense aurantia multiglobiferum putrifaciens pelagicum hanedai pusillum luteoviolaceae commune jannaschii kreigii vagum
1972 1973 1976 1977 1978 1979 1981 1982 1984 19861987 Oceanosprillum Marinomonas Alteromonas Shewanella linum(T) communis(T) putrifaciens(T) macleodii(T) japonicum vaga communis benthica minutium hanedai vaga biejerinckii haloplanktis maris maris rubra citrea maris williamsae esperjiana undina hiroshimense aurantia multiglobiferum putrifaciens pelagicum hanedai pusillum luteoviolaceae commune denitrificans jannaschii kreigii vagum
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 19871988 Oceanosprillum Marinomonas Alteromonas Shewanella linum(T) communis(T) putrifaciens(T) macleodii(T) japonicum vaga communis benthica minutium hanedai vaga biejerinckii haloplanktis maris maris rubra citrea maris williamsae esperjiana undina hiroshimense aurantia multiglobiferum putrifaciens pelagicum hanedai pusillum luteoviolaceae commune denitrificans jannaschii colwelliana kreigii vagum
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 19881990 Oceanosprillum Marinomonas Alteromonas Shewanella linum(T) communis(T) putrifaciens(T) macleodii(T) japonicum vaga communis benthica minutium hanedai vaga biejerinckii colwelliana haloplanktis maris maris rubra citrea maris williamsae esperjiana undina hiroshimense aurantia multiglobiferum putrifaciens pelagicum hanedai pusillum luteoviolaceae commune denitrificans jannaschii colwelliana kreigii tetradonis vagum biejerinckii pelagicum maris hiroshimense
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 19901992 Oceanosprillum Marinomonas Alteromonas Shewanella linum(T) communis(T) putrifaciens(T) macleodii(T) japonicum vaga communis benthica minutium hanedai vaga biejerinckii colwelliana haloplanktis maris maris algae rubra citrea maris williamsae esperjiana undina • Nomenclatural issue • Non-type strains hiroshimense aurantia multiglobiferum putrifaciens pelagicum hanedai pusillum luteoviolaceae commune denitrificans jannaschii colwelliana kreigii tetradonis vagum atlantica biejerinckii pelagicum carageenovora maris hiroshimense
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 19921995 Oceanosprillum Marinomonas Alteromonas Shewanella linum(T) communis(T) putrifaciens(T) macleodii(T) japonicum vaga communis benthica minutium hanedai vaga biejerinckii colwelliana haloplanktis maris maris algae rubra citrea maris williamsae esperjiana undina hiroshimense aurantia multiglobiferum putrifaciens pelagicum hanedai pusillum luteoviolaceae commune denitrificans jannaschii colwelliana kreigii tetradonis vagum atlantica biejerinckii pelagicum carageenovora distincta • Nomenclatural issues • Heterotypic synonymy • Data issue • Many to many relationship • Taxonomic issue • Which one is right? maris hiroshimense fuliginea
Pseudoalteromonas haloplanktis haloplanktis(T) nigrifaciens pisicida 1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 19921995 Oceanosprillum Marinomonas Alteromonas Shewanella linum(T) communis(T) putrifaciens(T) macleodii(T) japonicum vaga communis benthica haloplanktis tetradonis minutium hanedai vaga biejerinckii colwelliana haloplanktis atlantica maris maris algae rubra aurantia citrea maris williamsae carrageenovora esperjiana citrea undina hiroshimense esperjiana aurantia multiglobiferum luteoviolacea putrifaciens pelagicum hanedai pusillum luteoviolaceae commune rubra denitrificans jannaschii undina colwelliana kreigii tetradonis vagum atlantica biejerinckii pelagicum carageenovora distincta maris hiroshimense fuliginea
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 19951997 Oceanosprillum Marinomonas Alteromonas Shewanella Pseudoalteromonas linum(T) communis(T) putrifaciens(T) haloplanktis haloplanktis(T) macleodii(T) japonicum vaga communis benthica haloplanktis tetradonis minutium hanedai vaga biejerinckii colwelliana haloplanktis atlantica maris maris algae rubra aurantia citrea maris williamsae carrageenovora esperjiana citrea undina hiroshimense esperjiana aurantia multiglobiferum luteoviolacea putrifaciens pelagicum nigrifaciens hanedai pusillum pisicida luteoviolaceae commune rubra denitrificans jannaschii undina colwelliana kreigii antartica tetradonis vagum atlantica biejerinckii pelagicum carageenovora distincta maris hiroshimense fulginea elyakoviii
woodyii amazonensis oneidensis pealeana violacea 1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 19972000 Oceanosprillum Marinomonas Alteromonas Shewanella Pseudoalteromonas linum(T) communis(T) putrifaciens(T) haloplanktis haloplanktis(T) macleodii(T) japonicum vaga communis benthica haloplanktis tetradonis minutium mediterannea hanedai vaga biejerinckii colwelliana haloplanktis atlantica maris maris algae rubra aurantia citrea fridgidimarina maris williamsae carrageenovora esperjiana geldimarina citrea undina hiroshimense esperjiana aurantia multiglobiferum luteoviolacea putrifaciens baltica pelagicum nigrifaciens hanedai pusillum pisicida luteoviolaceae commune rubra denitrificans jannaschii undina colwelliana kreigii antartica tetradonis vagum bacteriolytica atlantica biejerinckii pelagicum prydzensis carageenovora tunicata distincta maris hiroshimense distincta fulginea elyakovii elyakoviii peptidolytica
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 20002001 Oceanosprillum Marinomonas Alteromonas Shewanella Pseudoalteromonas linum(T) communis(T) putrifaciens(T) haloplanktis haloplanktis(T) macleodii(T) japonicum vaga communis benthica haloplanktis tetradonis minutium mediterannea hanedai vaga biejerinckii colwelliana haloplanktis atlantica maris maris algae rubra aurantia citrea fridgidimarina maris williamsae carrageenovora esperjiana geldimarina citrea undina woodyii hiroshimense esperjiana aurantia amazonensis multiglobiferum luteoviolacea putrifaciens baltica pelagicum nigrifaciens hanedai oneidensis pusillum pisicida luteoviolaceae pealeana commune rubra denitrificans violacea jannaschii undina colwelliana japonica kreigii antartica tetradonis vagum bacteriolytica atlantica biejerinckii pelagicum prydzensis carageenovora tunicata distincta maris hiroshimense distincta fulginea elyakovii elyakoviii peptidolytica tetrodonis
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 20012002 Oceanosprillum Marinomonas Alteromonas Shewanella Pseudoalteromonas linum(T) communis(T) putrifaciens(T) haloplanktis haloplanktis(T) macleodii(T) japonicum vaga communis benthica haloplanktis tetradonis minutium mediterannea hanedai vaga biejerinckii colwelliana haloplanktis atlantica maris maris algae rubra aurantia citrea fridgidimarina maris williamsae carrageenovora esperjiana geldimarina citrea undina woodyii hiroshimense esperjiana aurantia amazonensis multiglobiferum luteoviolacea putrifaciens baltica pelagicum nigrifaciens hanedai oneidensis pusillum pisicida luteoviolaceae pealeana commune rubra denitrificans violacea jannaschii undina colwelliana japonica kreigii antartica tetradonis denitrificans vagum bacteriolytica atlantica livingstonensis biejerinckii pelagicum prydzensis carageenovora alleyanna tunicata distincta maris hiroshimense distincta fuliginea elyakovii elyakoviii peptidolytica tetrodonis
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 2001 20022004 Oceanosprillum Marinomonas Alteromonas Shewanella Pseudoalteromonas linum(T) communis(T) putrifaciens(T) haloplanktis haloplanktis(T) macleodii(T) japonicum vaga communis benthica haloplanktis tetradonis minutium mediterannea hanedai vaga biejerinckii primoryensis colwelliana haloplanktis atlantica maris maris algae rubra aurantia citrea fridgidimarina maris williamsae carrageenovora esperjiana geldimarina citrea undina woodyii hiroshimense esperjiana aurantia amazonensis multiglobiferum luteoviolacea putrifaciens baltica pelagicum nigrifaciens hanedai oneidensis pusillum pisicida luteoviolaceae pealeana commune rubra denitrificans violacea jannaschii undina colwelliana japonica kreigii antartica tetradonis denitrificans vagum bacteriolytica atlantica livingstonensis biejerinckii pelagicum prydzensis carageenovora alleyanna tunicata distincta mariniintestina maris hiroshimense distincta fulginea saire elyakovii elyakoviii schlegeliana peptidolytica gaetbuli stellipolaris tetrodonis 5 others litorea 12 others
November 2004 May 2004 Gammaproteobacteria Alteromonadales Colwelliaceae Idiomarinacea Alteromonadacea Colwelliaceae Alteromonas Idiomarina Aestuariibacter Thalassomonas Alishewanella Ferrimonadacea Colwellia Psychromonadacea Ferrimonas Ferrimonas Psychromonas Glaciecola Idiomarina Pseudoalteromonadaceae Marinobacter 1 Family 16 genera -> 8 families 12 genera 1 unclassified -> 7 unclassfied Which is correct? Which is supported by the data? Incertae sedis Pseudoalteromonas Marinobacterium Agarvorans Algicola Microbulbifer Alishewanella Moritella Marinobacter Shewanellaceae Pseudoalteromonas Marinobacterium Shewanella Psychromonas Microbulbifer Shewanella Salinomonas Moritellaceae Thalassomonas Teredinibacter Moritella Incertae sedis Teredinibacter