140 likes | 266 Views
CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart http://www.marine.csiro.au/caab/. Linking taxonomic resources. db A (e.g. field surveys) Taxon 1 ----- Taxon 2 ----- etc. User searches by scientific or other name.
E N D
CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart http://www.marine.csiro.au/caab/
Linking taxonomic resources db A(e.g. field surveys) Taxon 1 ----- Taxon 2 ----- etc. User searches by scientific or other name db query db B (e.g. specimen coll.) Taxon 1 ----- Taxon 2 ----- etc. Master query system db query db query db C (e.g. supporting info) Taxon 1 ----- Taxon 2 ----- etc. Available for any taxon: - scientific name (may or may not be consistent across db’s) - internal ID in database (may or may not be common across db’s) - possibly 1 or more external ID’s Organisation 1 Organisation 2 Organisation 3 ...
Name-based system • Names may vary across db’s (different opinions, old data, typographic errors, variant spellings, authorities present/absent, subgenera present/absent, etc.) • Names may change in future db A(e.g. field surveys) Taxon 1 ----- Taxon 2 ----- etc. db query db B (e.g. specimen coll.) Taxon 1 ----- Taxon 2 ----- etc. Master query system name 1a = name 1b = name 1c etc. db query db query (all possible names) db C (e.g. supporting info) Taxon 1 ----- Taxon 2 ----- etc. (high maintenance overhead at master db level) poss. name 1d not found Organisation 1 Organisation 2 Organisation 3 ...
Example (doughboy scallop) Chlamys asperrimum Chlamys asperrima Chlamys (Mimamachlamys) asperrimum Chlamys (Mimamachlamys) asperrima Mimachlamys asperrimum Mimachlamys asperrima - not really a synonymy, just a partial “potential variants” list; a full list would include versions with/without authors, any other synonyms, “near” matches (possible typographic errors), etc…
Name-based system • Names may vary across db’s (different opinions, old data, typographic errors, variant spellings, authorities present/absent, subgenera present/absent, etc.) • Names may change in future db A(e.g. field surveys) Taxon 1 ----- Taxon 2 ----- etc. db query db B (e.g. specimen coll.) Taxon 1 ----- Taxon 2 ----- etc. Master query system name 1a = name 1b = name 1c etc. db query db query (all possible names) db C (e.g. supporting info) Taxon 1 ----- Taxon 2 ----- etc. (high maintenance overhead at master db level) poss. name 1d not found Organisation 1 Organisation 2 Organisation 3 ...
External ID-based system db A(e.g. field surveys) Taxon 1 ----- (ID1) Taxon 2 ----- (ID2) etc. Master query system name 1 = ID1 etc. db query db B (e.g. specimen coll.) Taxon 1 ----- (ID1) Taxon 2 ----- (ID2) etc. db query (low maintenance overhead at master db level) db query (single ID) db C (e.g. supporting info) Taxon 1 ----- (ID1) Taxon 2 ----- (ID2) etc. • Data searching is name independent (user agencies can follow own wishes re consistency, formats, timing of updates etc.) Organisation 1 Organisation 2 Organisation 3 ...
Essential/desirable properties of external taxon identifiers • Essential • ability to cover all taxonomic groups of interest • ability to cope with numbers of taxa potentially required • translation system (codes:names) readily accessible • codes can be created in realistic time frame for taxa needed • Desirable • systematic/meaningful approach to code allocation (cf. telephone numbering system) - understandable to humans • not too many digits • codes are stable (preferably NOT dependent on genus/species name) • taxon names are reliable (i.e., content is subject to ongoing QC and maintenance as needed) • compatibility/interoperability with emerging global standards
CMR’s “CAAB” system” • able to cover all taxonomic groups • up to 999,999 codes (optionally 3 million) per “major category” (phylum or similar) • web interface for codes/names access • local (i.e. Australian) control of content (rapid data addition possible) • systematic/meaningful approach to code allocation (category number, family number and species/taxon number) • not too many digits (2 digits for category and 6 for family+species) • codes are stable (not dependent on genus/species name) • taxon name maintenance can be devolved to relevant specialists • cross-mapping to ITIS and other codes incorporated in current database structure • possible candidate or model for a national system?
Other CAAB features • searchable by scientific or common name synonyms/variants, as entered in the database (useful as entry point) • comments fields available for external display and/or admin use • holds custom links for database querying at CMR via the web • holds on-line links to other information resources • CAAB administration and data entry can be carried out remotely by relevant persons (uses web access tools and user/domain authentication) • special sections of CAAB are available to deal with family-level groups, other species groupings as needed, and informal/agency-designated taxa
ITIS codes - another option • Pluses... • possibly a global standard in the future • able to cover all taxonomic groups • no limit to number of taxa which can be covered • web interface for codes/names access • not too many digits (typically 5 or 6) • Minuses... • codes are non meaningful (just a number, semi-random allocation) • codes are fixed to genus/species name, however, cross-mapping is maintained within the database for synonyms, where held on the system • would need to investigate how locally supplied content might be able to be added to the master system, in a realistic time frame
Topics arising -for consideration for “Australian virtual museum”... • Names or (name independent) taxon ID’s to be used for database linkages? • If names, is a master list achievable in real time? Who will undertake continuous update required? What performance implications may arise? • If taxon ID’s, what is the best route, e.g…. • ITIS (with upgraded Australian content)? • CAAB or other existing Austr. system (extended)? • A new system designed from the ground up? • If any of the above, what is the best way to manage and resource the process? What time frame is realistic?
Sample scientific name search (NB: also searches synonyms as held in the database)
Sample scientific name search (go to demo if available)