1 / 24

Taxonomic and Nomenclature Data

Data Quality. Taxonomic and Nomenclature Data. A. D. Chapman. Data Validation. two key sources of error are: Taxonomic names Georeferences (lat’s and long’s) Methods for identifying error Documented here -----------------> available via GBIF web site http://www.gbif.org.

zach
Download Presentation

Taxonomic and Nomenclature Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Quality Taxonomic and Nomenclature Data A. D. Chapman Data Quality Training SABIF June 2012

  2. Data Validation two key sources of error are: • Taxonomic names • Georeferences (lat’s and long’s) Methods for identifying error Documented here -----------------> available via GBIF web site http://www.gbif.org Data Quality Training SABIF June 2012

  3. Taxonomic Data Consists of: (not all are always present): • Name (scientific, common, hierarchy, rank) • Nomenclatural status (synonym, accepted, typification) • Reference (author, place and date of publication) • Determination (by whom and when the record was identified) • Type specimen citation • Quality fields (accuracy of determination, qualifiers) Data Quality Training SABIF June 2012

  4. Taxonomic Quality The capacity of an institution to produce high quality taxonomic products is influenced by: • the level of training and experience of staff, • the level of access to technical literature, reference and voucher collections and taxonomic specialists, • the possession of appropriate laboratory equipment and facilities, and • access to the internet and the resources available there. (after Stribling et al. 2003) Data Quality Training SABIF June 2012

  5. Determining Quality • Not always easy • Seldom carried out • Use of Determinavit slips • Qualifiers (aff, cf., s.str., s.lat., ? ) • Documentation? Data Quality Training SABIF June 2012

  6. Documenting Taxonomic Data Quality • Several methods exist for documenting taxonomic verification - none are completely satisfactory • Herbarium Information Standards and Protocols for the Interchange of Data (HISPID) • Australian National Fish Collection (1993) • Several others restricted to one or two institutions • Proposal – four level: • Who determined the specimen and when • What was used (type specimen, local flora, monograph, etc.) • Level of expertise of the determiner • What confidence did the determiner have in the determination. Data Quality Training SABIF June 2012

  7. Documenting Quality - 2 From: Herbarium Information Standards and Protocols for the Interchange of Data (HISPID) 0 The name of the record has not been checked by any authority 1 The name of the record determined by comparison with other named plants/animals 2 The name of the record determined by a taxonomist or by other competent persons using collections and/or library and/or documented living material 3 The name of the plant determined by taxonomist engaged in systematic revision of the group 4 The record is part of the type gathering Data Quality Training SABIF June 2012

  8. Documenting Quality - 3 From: Australian National Fish Collection (in use since 1993) Level 1:Highly reliable identification Specimen identified by (a) an internationally recognised authority of the group, or (b) a specialist that is presently studying or has reviewed the group in the Australian region. Level 2:Identification made with high degree of confidence at all levels Specimen identified by a trained identifier who had prior knowledge of the group in the Australian region or used available literature to identify the specimen. Level 3:Identification made with high confidence to genus but less so to species Specimen identified by (a) a trained identifier who was confident of its generic placement but did not substantiate their species identification using the literature, or (b) a trained identifier who used the literature but still could not make a positive identification to species, or (c) an untrained identifier who used most of the available literature to make the identification. Level 4:Identification made with limited confidence Specimen identified by (a) a trained identifier who was confident of its family placement but unsure of generic or species identifications (no literature used apart from illustrations), or (b) an untrained identifier who had/used limited literature to make the identification. Level 5:Identification superficial Specimen identified by (a) a trained identifier who is uncertain of the family placement of the species (cataloguing identification only), (b) an untrained identifier using, at best, figures in a guide, or (c) where the status & expertise of the identifier is unknown. Data Quality Training SABIF June 2012

  9. Taxon Verification Status - proposed Name of determinor: Date of determination: Source of determination: (e.g. compared with holotype, used national flora) • identified by World expert in the taxon with high certainty • identified by World expert in the taxon with reasonable certainty • identified by World expert in the taxon with some doubt • identified by regional expert in the taxon with high certainty • identified by regional expert in the taxon with reasonable certainty • identified by regional expert in the taxon with some doubt • identified by non-expert in the taxa taxon high certainty • identified by non-expert in the taxa taxon reasonable certainty • identified by non-expert in the taxa taxon some doubt • identified by the collector with high certainty • identified by the collector with reasonable certainty • identified by the collector with some doubt. From: Chapman (2005) Principles of Data Quality. GBIF Data Quality Training SABIF June 2012

  10. Error checking • Missing Data Values • empty fields where values should occur (e.g. if a species epithet is present, then a generic name MUST be present) Data Quality Training SABIF June 2012

  11. Error checking • Incorrect Data Values • typographic errors, • transposition of key strokes, • data entered in the wrong place (e.g. a species epithet present in a generic name field) Can often be identified using Soundex/Phonex techniques Data Quality Training SABIF June 2012

  12. Error checking • Nonatomic Data Values • More than one fact entered into a single field (e.g. a species bionomial or trinomial present in a single field) Data Quality Training SABIF June 2012

  13. Error checking • Domain Schizophrenia • Fields used for purposes for which they weren’t intended e.g. Good reference: Dalcin, E.C. 2004. Data Quality Concepts and Techniques Applied to Taxonomic Databases. Thesis for the degree of Doctor of Philosophy, School of Biological Sciences, Faculty of Medicine, Health and Life Sciences, University of Southampton. November 2004. 266 pp. Data Quality Training SABIF June 2012

  14. HSJRP CRIA Data Cleaning http://splink.cria.org.br/dc Data Quality Training SABIF June 2012

  15. CRIA Data Cleaning Data Quality Training SABIF June 2012

  16. CRIA Data Cleaning Data Quality Training SABIF June 2012

  17. CRIA Data Cleaning Data Quality Training SABIF June 2012

  18. CRIA Data Cleaning Data Quality Training SABIF June 2012

  19. CRIA Data Cleaning - Statistics IAL-Aves Data Quality Training SABIF June 2012

  20. GBIF Data Cleaning Demo Interface No longer operating Data Quality Training SABIF June 2012

  21. GBIF Data Cleaning Demo Interface http://www.secretariat.gbif.net/datatester/index.jsp Data Quality Training SABIF June 2012

  22. Quality Control Checks on Flickr • Flickr is an image site on the internet • www.flickr.com • being used by the Encyclopedia of Life project (EOL) to link images • On the site Paul Morris has developed a Quality Control system to identify errors in the machine tagging of records. At this stage it doesn’t test names, but does check the formation of the tags. Such a system could be extended. • The project is a contribution to the Filtered Push project, NSF:DBI #0646266 • Let’s look at how it works • http://www.flickr.com/groups/encyclopedia_of_life/ Data Quality Training SABIF June 2012

  23. Other projects • A number of projects are looking at new methods for quality checking. These include • Atlas of Living Australia (http://www.ala.org.au/) • TDWG (www.tdwg.org) • Global Names Architecture (http://www.globalnames.org/) • GBIF – ECAT (http://www.gbif.org/informatics/name-services/) Data Quality Training SABIF June 2012

  24. Questions ? Data Quality Training SABIF June 2012

More Related