270 likes | 342 Views
The Future of Microalgal Taxonomy Anne Thessen , athessen@mbl.edu David Patterson dpatterson@mbl.edu (Data Conservancy, Life Sciences). Scientist’s Dream. Computer, what is the trajectory of the planet Seti Alpha 5?. Taxonomist’s Dream. How many algal species can be found on this planet?.
E N D
The Future of Microalgal TaxonomyAnne Thessen, athessen@mbl.eduDavid Patterson dpatterson@mbl.edu(Data Conservancy, Life Sciences)
Scientist’s Dream Computer, what is the trajectory of the planet Seti Alpha 5?
Taxonomist’s Dream How many algal species can be found on this planet?
Taxonomist’s Dream What species is this?
Setting the stage for a ‘big new biology’ • BIG = data-centric (like particle physics and astronomy) • Characterized by data sharing via a virtual pool • New = new skill sets, tools, cyber-infrastructure to exploit the data pool • Data driven discovery as a new means of understanding • GenBank as a model within the Life Sciences
Small science Small number of providers with lots of data. Large number of providers with small amounts of data.
Names Limulus polyphemus Kiwahirsuta Trypanosomabrucei Aapaleacea Homo sapiens Pierisrapae Kingiaaustralis Pieris japonica Osedaxfrankpressi
Many names for one taxon Gomphonemavulgare Didymospheniageminata Gomphonemageminatum Didymospheniageminata Rock snot Didimospheniageminata Didymospheniageminata Didymo Echinellageminata
Reconciliation Group Didymospheniageminata Didimospheniageminata Didymo Rock Snot Echinellageminata Gomphonemageminatum Gomphonemavulgare
Reconciliation Group Didymospheniageminata Didimospheniageminata Didymo Rock Snot Echinellageminata Gomphonemageminatum Gomphonemavulgare
One name for many taxa Cyclophoratenuis CyclophoraCastracane 1878 Cyclophora CyclophoraHübner 1822 Cyclophoraporata . Contextual data Diatom Chloroplast Frustule Benthic Marine Contextual data Food Moth Wings Exoskeleton Caterpillar Disambiguate by authority, species, contextual data
Global Names Architecture DATA AND SERVICE CONSUMERS Consumer Services GNA EXPERTS Provider Services DATA AND SERVICE PROVIDERS
Managing names to manage biodiversity data • All names (scientific vernacular surrogate) • For all organisms • Many names for one species reconciled • One name for many species disambiguated • Global Names Architecture • a virtual layer, using names services to link together distributed data • Globalnames.org • Micro*scope (microscope.mbl.edu) and Encyclopedia of Life (eol.org) Names-based cyberinfrastructure
Narrative tradition in biology Too much for a human Can we get a machine to do the work? NLP!!! Legacy Data
Use NLP/machine learning to extract names and characters Hong Cui Legacy Data
Spirogyra:chloroplasts:present Legacy Data
Spirogyra:chloroplasts:present:attribution Legacy Data
Coffee Ontology is a coffee drink
Future Data Triple Store
Informatics/computing training Modified workflows Importance of data management and preservation The New Workforce
Big New Biology is coming, taxonomy can benefit from being a part of it Existing data can be made machine-readable using information extraction algorithms Existing workflows can be modified to capture data close to the source Data can be shared using the semantic web In Summary
DimaMozzherin David Shorthouse SayeedChoudhury Pete DeVries Acknowledgments