1 / 41

Taxonomic databases: The SEEK and VegBank experience

Taxonomic databases: The SEEK and VegBank experience. R.K. Peet The University of North Carolina Ecological Society of America Vegetation Panel The SEEK development team. Biodiversity informatics depends on accurate and precise taxonomy.

genera
Download Presentation

Taxonomic databases: The SEEK and VegBank experience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taxonomic databases: The SEEK and VegBank experience • R.K. Peet • The University of North Carolina • Ecological Society of America Vegetation Panel • The SEEK development team

  2. Biodiversity informatics depends on accurate and precise taxonomy • Accurate identification and labelling of organisms is a critical part of collecting, recording and reporting biological data. • Increasingly, research in biodiversity and ecology is based on the integration (and re-use) of multiple datasets.

  3. What was a minor annoyance for a few tens of records becomes intractable when looking at a million records. • Some data types, such as organism identifications, are inherently more complex to define with the consequence that few standards have been adopted.

  4. Locality Observation/ Collection Event Observation database Specimen or Object Occurrence database Bio-Taxon Taxonomic database Biodiversity data structure Observation or Community Type Observation type database

  5. VegBank • The ESA Vegetation Panel is developing VegBank as a public archive for vegetation plot observations (http://vegbank.org). • VegBank is expected to function for vegetation plot data in a manner analogous to GenBank. • Primary data will be deposited for reference, novel synthesis, and reanalysis. • The database architecture is generalizable to most types of species co-occurrence data.

  6. www.vegbank.org

  7. What is SEEK? Science Environment for Ecological Knowledge Multidisciplinary project to create: Scientific-workflow system (Kepler) • Design, reuse, and execute scientific analyses Distributed data network (EcoGrid) • Environmental, ecological, and systematics data KR & Semantic Mediation • Discover, integrate, and compose hard-to-relate data and services via ontologies Taxonomic concept services • Resolve taxon ambiguities Collaborators (the SEEK team) • NCEAS, UNM, SDSC/UCSD, U Kansas • Vermont, Napier, ASU, UNC

  8. User’s Taxonomic concept + quality measure Semantic Mediation System Concept matching/expansion/… Weighted concepts Return list of Data Sets Name/Concept Repository Taxon coverage EML repository Ecological metadata language - EML (Containing Collector’s Taxonomic concept(s)) Taxonomy transfer schema - TML Concept Provider 3 e.g. Prometheus Concept Provider 2 e.g. ITIS Concept Provider 1 e.g. Fishbase Ecological Data Set Data Set Data Set Ecological data set providers Taxonomic concept providers Data Set SEEK High-Level Approach

  9. Taxonomic database challenge:Standardizing organisms and communities The problem: Integration of data potentially representing different times, places, investigators and taxonomic standards. The traditional solution: A standard list of organisms / communities.

  10. Standard lists are available for Taxa Representative examples for higher plants in North America / US USDA Plants http://plants.usda.gov ITIS http://www.itis.usda.gov NatureServe http://www.natureserve.org BONAP Flora North America These are intended to be checklists wherein the taxa recognized perfectly partition all plants. The lists can be dynamic.

  11. Three concepts of subalpine fir Splitting one species into two illustrates the ambiguity often associated with scientific names. Abies bifolia Abies lasiocarpa Abies lasiocarpa sec. Little sec. USDA PLANTS sec. Flora North America

  12. One concept ofAbieslasiocarpa USDA Plants & ITIS Abies lasiocarpa var. lasiocarpa var. arizonica

  13. A narrow concept of Abies lasiocarpa Flora North America Abies lasiocarpa Abies bifolia Partnership with USDA plants to provide plant concepts for data integration

  14. Andropogon virginicus complex in the Carolinas 9 elemental units; 17 base concepts

  15. Standardized taxon lists fail • to allow dataset integration • The reasons include: • Taxonomic concepts are not defined (just lists), • Relationships among concepts are not defined • The user cannot reconstruct the database as viewed at an arbitrary time in the past, • Multiple party perspectives on taxonomic concepts and names cannot be supported or reconciled.

  16. Taxonomic theory A taxon concept represents a unique combination of a name and a reference. Report -- name sec reference. . Name Concept Reference

  17. A usage represents an association of a concept with a name. Name Usage Concept • The name used in defining the concept need not be the same name used in your work. • e.g. Carya alba = Carya tomentosa sec. Gleason & Cronquist 1991. • Usage can be used to apply multiple name systems to a concept

  18. Relationships among conceptsallow comparisons and conversions • Congruent, equal (=) • Includes (>) • Included in (<) • Overlaps (><) • Disjunct (|) • and others …

  19. High-elevation fir trees of western US AZ NM CO WY MT AB eBC wBC WA OR Distribution Abies lasiocarpa var. arizonica var. lasiocarpa USDA & ITIS Abies bifolia Abies lasiocarpa Flora North America A. lasiocarpasec USDA > A. lasiocarpasecFNA A. lasiocarpasecUSDA > A. bifoliasecFNA A. lasiocarpa v. lasiocarpasecUSDA > A. lasiocarpasecFNA A. lasiocarpa v. lasiocarpasecUSDA|A. bifoliasecFNA A. lasiocarpa v. arizonicasecUSDA < A. bifoliasecFNA

  20. Party Perspective • The Party Perspective on a Concept includes: • Status – Standard, Nonstandard, Undetermined • Correlation with other concepts – Equal, Greater, Lesser, Overlap, Undetermined. • Start & Stop dates.

  21. Intended functionality • Organisms are labeled by reference to concept (name-reference combination), • Party perspectives on concepts and names can be dynamic, but remain perfectly archived, • User can select which party perspective to follow, and at which date, • Different names systems are supported, • Enhanced stability in recognized concepts by separating name assignment and rank from concept.

  22. Best practice: Report taxa by reference to concepts. When reporting the identity of organisms in publications, data, or on specimens, provide the full scientific name of each kind of organism and the reference that provided the taxonomic concept. e.g., Abies lasiocarpa sec. Flora North America 1997.

  23. Best practice: Choose high-quality concepts • Reference high-quality sources for taxon concepts such as a major compendium that provides its own defined concepts, or a source that references the concepts of others. • Avoid checklists as they typically lack true taxonomic descriptions or circumscriptions.

  24. SEEK & GBIF are working to provide standards for concept data • Several data models incorporate taxon concepts. The IOPI, VegBank, and Taxonomer models are optimized for different uses. • SEEK, GBIF, and TDWG developed TCS, which was adopted by TDWG in August 2005 and is being implemented by GBIF and SEEK.

  25. Concepts and identifications are distinct. • A name in a publication could be either a concept or an identification. • An annotation is an identification. • Identifications should include linkage to at least one concept, but need not be limited to a single concept.

  26. Documenting identifications Relationships added for identification = Indicates identification ~ (or aff.) Indicates similarity ≡ Indicates identity, or defined as Example of complex identification < Potentilla sec. Cronquist 1991 + ~ Potentilla simplex sec Cronquist 1991 + ~ Potentilla canadensis sec Cronquist 1991

  27. Fuzzy logic qualification 1 = Absolutely wrong 2 = Understandable but wrong 3 = Reasonable or acceptable 4 = Good answer 5 = Absolutely correct

  28. Biodiversity informatics depends on standards and connectivity • Names (Linnean Core) • Taxonomic concepts (TCS) • Publications (Alexandrian core, etc) • Observations (proposed TDWG standard) • Identifications (proposed EML extension) • GUIDS (under development by GBIF)

  29. Tools to develop and map concepts • Taxonomists need mapping and visualization tools for relating concepts of various authors. SEEK is building prototypes for review and possible adoption. • Aggregators need tools for mapping relationships among concepts. • Users need tools for entering legacy concepts. Several are in development.

  30. Concept mapper

  31. Demonstration Projects Concept relationships of Southeastern US plants treated in different floras. Based on > 50,000 mapped concepts

  32. Distributed information systems - and the way ahead Step 1:Adoption of minimum standards and best practices by high-quality journals, funding agencies, and professional organizations.

  33. Publishers, curators and data managers need to tag taxon interpretations with concepts • Precedence exists with tagging literature citations and GenBank accessions • Presses are linking scientific names in many ejournals to ITIS (e.g. Evolution, Ecology)

  34. The way ahead Step 2: Creation, availability, and maintenance of databases that document core sets of taxonomic concepts and the relationships of these concepts to each other.

  35. True concept-based checklists • Equivalent of ITIS but with concept documentation and including how other concepts map onto the concepts accepted by the party. • Several are operative or in development including EuroMed, IOPI-GPC, Biotics, VegBank. Concept documentation planned for ITIS/USDA.

  36. Registration system and standard identifiers for names, references, and concepts • Essential for data exchange • GBIF is hosting a set of international workshops to design the GUID infrastructure.

  37. The way ahead Step 3:Development and provision of tools to facilitate mark-up of data and manuscripts with taxonomic concepts Step 4: Demonstration projects

More Related