210 likes | 224 Views
This article explores the construction of a demonstration database of taxonomic concept data, with a focus on the Carya genus. It evaluates the effectiveness of the VegBank model and presents different party perspectives on concepts and names. The goal is to accurately capture information without personal judgments.
E N D
Carya -- Lessons from constructing a demonstration database of taxonomic concept data • Robert K. Peet • University of North Carolina
Carya example • Genus in the Juglandaceae • Roughly 22 currently recognized taxa • 15 North American species • (including 1 with 3 varieties) • 4 Asian species
Carya example • First goal was to provide a test dataset for SEEK containing all the features the VegBank, NatureServe, and FGDC design groups thought important. • Second goal was to evaluate the effectiveness of the VegBank model • Five party perspectives are presented showing different types of database implementation and functionality that can be expected in SEEK. • A basic principle is that the information content is accurately captured without adding personal judgements.
Intended functionality(VegBank model) • Organisms are labeled by reference to concept (name-reference combination), • Party perspectives on concepts and names can be dynamic, but remain perfectly archived, • User can select which party perspective to follow, • Different names systems are supported, • Enhanced stability in recognized concepts by separating name assignment and rank from concept.
Party Perspective • The Party Perspective on a concept includes: • Status – Standard, Nonstandard, Undetermined • Correlation with other concepts – • Multiple relationships: e.g. Equal, Greater, Lesser, Overlap, Undetermined… • Start & Stop dates for tracking changes
Data relationshipsVegBank taxonomic data model Name Concept Usage Start, Stop NameStatus Name system Correlation Party Status Start, Stop ConceptStatus Level, Parent Reference
Parties/perspectives – 1&2 Flora North America Flora of China • Traditional (print-like) publications • Large scope, well accepted • Sources and synonymy documented • Typical of publications SEEK will need to incorporate
Parties/perspectives – 1&2 Flora North America , Flora of China Typical of how to treat a monograph • Record all accepted taxa as standard concepts. • Record all original descriptions as nonstandard concepts with full reference. Correlation = synonym. • Record all listed synonyms as nonstandard concepts with the publication being the original for the name unless otherwise stated. Correlation = synonym. • Some monographs (not FNA) make frequent reference to how taxa relate to taxa in other monographs, which could also be added.
Parties/perspectives - 3 PLANTS/ITIS 2002 • Naked list of accepted taxa with synonyms • Versions not yet well documented • Common names and codes • Typical SEEK initial population source
Parties/perspectives - 3 PLANTS/ITIS 2002 • Each accepted name is treated as a standard concept with the reference being ITIS2002. • Each standard concept is provided as a nonstandard synonym the original publication of the name, which I look up from various sources such as IPNI. • Each synonymized name is treated as the concept in the original publication of the name and the correlation is as synonym. • Common names and USDA codes treated as alternative name systems.
Parties/perspectives - 4 BONAP 1980, 1994, 1999, 2004 • Naked list with synonyms • Multiple editions • Source of ITIS & USDA PLANTS • Four editions available (with 2004 in draft and incomplete)
Parties/perspectives - 4 BONAP 1980, 1994, 1999, 2004 • Generally same as with PLANTS/ITIS • Each year treated as a separate publication with it own concepts (because there were changes between years in the overall treatment). • Names defined by character strings, so multiple versions of “same name” and concept occur. • No between-year synonymy.
Parties/perspectives - 5 Peet’s Global Synthesis • Span 1980-2004 evolving perspective with changes in accepted concepts only when necessary. • Provide appropriate, high-quality, reference-based concepts. • Document sources of all names and correlations of all concepts in current or recent usage. • Interpret, document, and expand BONAP/ITIS perspective while incorporating correlations with the FNA and F. China concepts. • Priority for FNA & F China concepts • Correlations include full diversity of operators
Some Statistics • Currently 22 taxa • 270 names with references • > 500 total names – various name systems & spellings • 118 references • 50 reference parties • ~ 300 concepts • ~ 470 concept status assignments • ~ 735 usage assignments • Many correlations
Correlation types 1 • = – Equal: The concepts are equivalent • ~ – Similar: A weaker, less certain form of “equal”. • > – Greater than: The rejected concept contains all of but is larger than the accepted concept. In the case of a splitting event, a no-longer accepted concept is now > each of two or more accepted concepts. • < – Less than: The rejected concept contains some of but is smaller than the accepted concept. In the case of a lumping even a no-longer accepted concept is now < the accepted concept. • ≠ – Not equal: These concepts are not equal, but contain at least some specimens in common. [relatively uncommon; for complex situations]
Correlation types 2 • V – Overlapping: These concepts contain some specimens in common but each contains some specimens that the other does not. [equivalent to simultaneous ><] • D – Disjunct: the concepts contain no specimens in common [intended to confirm that these are entirely different; generally not required but can be used for clarification] • U – Undetermined: The party has not evaluated the relationship (null record) [equivalent to null record, but more explicit]
Correlation types 3 • ≡ – Exactly the same: These are the same concept with the differences resulting from some slight difference in the rendering of the reference or the name. • S – Synonyms: The rejected concept is listed as a synonym of the accepted concept, but no further information is provided as to the relationship between the concepts [could be either a concept or usage difference] • ? – Ambiguous: This record refers to one possible resolution of an irreconcilable ambiguity as to which accepted concept should be applied. [This applies to mapping opinions of a party in a reference by a third part; includes pro parte synonymy.] • N – Nomenclatural: Concept of the author of the accepted concept (for recombinations; another form of equal).
Correlation types 4 • T – Type: Concept of the author of the type description • C – Constituents: The rejected concept differs from the accepted one only in that the constituents are different, such as a new species having been added to a genus. This is intended to imply very similar, which would not be the case where a concept was created by lumping (<), or splitting (>), or accreting a concept from a different location in the hierarchy.
Lessons - 1 • Rank is perhaps best treated as an attribute of name. • Need to be able to track multiple hierarchies. • Best place for tracking parent-child relationships is unclear. Perhaps treat as a form of correlation. • Usage and correlation need reference and note fields.
Lessons - 2 5. We currently define a plant concept based on a text string as it occurs in a reference – but there are multiple ways to render the authorities associated with a name, and multiple spellings of even the Latin based on gender and diacritical marks. Perhaps we are inflating the number of names far too much and should add a mechanisms for treating the string as different from the name.
Lessons - 3 • We define a plant concept based on a text string as it occurs in a reference, but sometime references are explicit about pages and plates, with the result that a reference like Flora North America could occur many times in the reference table. Alternatives for assigning GUIDs include (A) ignore location in the reference, (B) support multiple reference instances that contain much of the same information, and (C) employ the microreference of the proposed data exchange standard.