1 / 27

Biodiversity Informatics and the Biodiversity Literature

Biodiversity Informatics and the Biodiversity Literature. Overview. Progress over the last decade Organism occurrence data Taxonomic databases The next challenge Describing diversity. Organism Occurrence Data.

ronnie
Download Presentation

Biodiversity Informatics and the Biodiversity Literature

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biodiversity Informatics and the Biodiversity Literature

  2. Overview • Progress over the last decade • Organism occurrence data • Taxonomic databases • The next challenge • Describing diversity

  3. Organism Occurrence Data Tools and standards created in biodiversity informatics enable data to be aggregated from around the world. data.GBIF.org End User Anywhere The Global Biodiversity Information Facility (GBIF) is the largest aggregator of organism occurrence data. Institutions CAS USNM FMNH NHM MNHN Collection Databases California Academyof Sciences National Museum of Natural History Field Museumof Natural History The Natural History Museum Museum Nacional de Histoire Natural San Francisco Washington Chicago London Paris

  4. Organism occurrence data

  5. Distribution models

  6. Remaining challenges with occurrence data • Lots of digitization still to do • Taxonomic identifications need to be updated • Georeferencing still needs to be done Relationship to literature: • Specimens and observations are primary data • Literature contains both reports of primary data, as well as summarized data • Large scale digitization efforts in museums might (will) swamp the content in literature

  7. Taxonomic Databases >20M increasing density of names in relevant corpus Nomenclator Checklist valid / accepted taxa(plus synonyms) Catalog of uses in taxonomic works Index – all uniquename-stringsmappedto valid names/concepts

  8. Emergent consensus • Philosophical/methodological debates • Species concepts • Biological • Evolutionary • Phylogenetic • Taxonomic definitions • Circumscription • Synonymized types • Set of specimens identified by taxon author • Tree or linneage-based definition

  9. Anchor name-usage to publication metadata; actual publication; enable validation Citation(publication metadata) Name Usage Name begin end

  10. Remaining challenges with taxonomic data • Taxa are concepts created in literature • Physical instances of the same published work are “equivalent” • Develop shared logical identifiers • Reconciliation across “authoritative” databases; fewer number of same as records

  11. Recap • Taxonomic names are key to • Information retrieval • Information summary and grouping • Publication metadata are critical to anchoring taxonomic concepts, and • Providing the semantic touchstones for collaboration (critical) • Occurrence data gives us species distributions • Direct relationship to literature is small • But taxonomy is critical to integrating occurrence data, so the literature is still fundamental

  12. What’s next What’s next?

  13. What other classes of information remain in the literature? …that could be extracted and structured to be really useful?

  14. Genetic/Genomic data Genetic and genomic data? …are not communicated or stored in the literature

  15. A Model Organism Daniorerio the zebrafish

  16. Understanding the origins of speciesthrough structured descriptions of diversity Morphological Diversity Phenotype A Phenotype B Development Genomic Diversity Genotype A Genotype B mutation evolution

  17. Morphological variation across species difficult to find and synthesize

  18. Information retrieval from free-text is difficult

  19. Not computable across studies (Lundberg and Akama 2005)

  20. What is an ontology? • A set of well-defined terms and the logical relationships that hold between them • Represents knowledge of a discipline

  21. Teleost Anatomy Ontology terms and relationships ventral hyoid arch pharyngeal arch cartilage part_of is_a replacement bone basihyal cartilage part_of basihyal element is_a is_a develops_from basihyal bone

  22. Ontologies quickly become large and complex; guiding philosophy required The Teleost Anatomy Ontology contains 3,039 terms, with >600 skeletal terms Dahdul et al., 2010, Systematic Biology

  23. Translational medicine Fig. 1, Washington et al., 2010 Translation from model organisms to humans

  24. Phenoscape II & Research Coordination Network (RCN) • Extended to include other model organisms and taxonomic groups, e.g.: • Amphibian Anatomy Ontology (AAO) – Blackburn, CAS • Hymenoptera Anatomy Ontology (HAO) – Deans, NCSU • Plant Ontology – Huala, Stanford • NLP and term extraction (Hong Cui, Univ of Arizona)

  25. What’s next? • Description of biological phenomena • Determining how best to do this will take time • Top-down design, guided by functional demonstration • Bottom-up curation of existing descriptions, • into structured knowledge through iteration

More Related