180 likes | 322 Views
The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores. P. Bryan Heidorn heidorn@email.arizona.edu Steven Chong stevenchong@email.arizona.edu. University of Arizona, School of Information Resources and Library Science
E N D
The Ingest and Uses of Specimen Label Data into Semantic Knowledge Stores P. Bryan Heidorn heidorn@email.arizona.edu Steven Chong stevenchong@email.arizona.edu University of Arizona, School of Information Resources and Library Science Semantics for Biodiversity Symposium – TDWG 2013 Annual Conference Florence, Italy October 30, 2013
Anyone can say anything about anything Museum Labels Semantic Information Store Heidorn and Chong - TDWG 2013 http://vector-magz.com/fashion/glass-slipper-clip-art-item-3/ http://www.maskworld.com/english/products/make-up/--/monster-hands--640/ogre-feet--SP-3310-black
iDigBioHackathon Heidorn and Chong - TDWG 2013 • Coincided with 2013 iConference meeting in Dallas/Fort Worth, Texas • 28 participants from a variety of backgrounds and institutinos • Goal: develop new tools to parse OCR output from specimen labels into Darwin Core. Results were compared again human-parsed gold and silver files • Three datasets • Easy – 10,000 images of lichens, bryophytes and climate change TCN, and lichen and bryophyte packet labels. Little or no handwriting present • Medium – 5,000 BRIT Herbarium and NYBG Herbarium specimen sheets. Some handwriting present. • Hard – Several thousand images of entomology specimens from the Essig Museum and CalBug
iDigBioHackathon Heidorn and Chong - TDWG 2013
Mapping to Darwin Core Heidorn and Chong - TDWG 2013
Namespaces and Modeling • CIDOC CRM • International Council of Museums – International Committee for Documentation’s Conceptual Reference Model • http://www.cidoc-crm.org/index.html • Semantic framework for cultural heritage information • Being harmonized with Functional Requirements for Bibliographic Records (FRBR) into FRBRoo ontology – goal: “to facilitate the integration, mediation, and interchange of bibliographic and museum information” • Relation Ontology • Biology specific relations • http://obofoundry.org/ro/ Heidorn and Chong - TDWG 2013
Some digitization support from Semantics Copy (some) metadata from duplicates Up date taxonomy Link to literature for types Heidorn and Chong - TDWG 2013
Semantics of Duplicates and Citation • Multiple types of Specimen Duplicates • Different inferences licensed by type • Multiple types of Citation • Different inferences licensed by type Heidorn and Chong - TDWG 2013
Duplicate Specimens = Fairchild Tropical Garden NYBG Heidorn and Chong - TDWG 2013
CIDOC CRM overlaying DwC E78: Collection E78: Collection Museum 1 Museum 2 P12’: was present at P12’: was present at E8: Acquisition P52: has current owner Collection Event P52: has current owner Catalog Number Specimen 2 E19: Biological Object P1: is identified by P1: is identified by E19: Biological Object P1 P7: took place at P14: carried out by P78’: is identified by Scientific Name Scientific Name E41: Appellation (instance)E42: Identifier (Class) E41: Appellation Locality Date Collector E50: Date E53: Place E39: Actor Heidorn and Chong - TDWG 2013
Population Duplicate http://www.ncbi.nlm.nih.gov/genbank/ dwc;basisOfRecord Sequence? P12: was present at P12: was present at Dwc:Collection Event E19: Biological Object E19: Biological Object E8: Acquisition P41: Classified P41: Classified Dwc:Identification TaxonomicName Dwc:Identification E17: Type Assignment TaxonID Heidorn and Chong - TDWG 2013
Valid Inferences The specimens/individuals/samples? Come from the same population The DNA will be same at species but not individual level If you do DNA sequencing on one: the DNA of the other will be similar Heidorn and Chong - TDWG 2013
Individual Duplicate (Material Sample) http://www.ncbi.nlm.nih.gov/genbank/ dwc;basisOfRecord Sequence? E19: Biological Object ro: derives_from P24: Transfer? ro: derives_from P12: was present at P12: was present at P12: was present at Dwc:Collection Event E19: Biological Object E19: Biological Object E8: Acquisition P41: Classified P41: Classified Dwc:Identification TaxonomicName Dwc:Identification E17: Type Assignment TaxonID Heidorn and Chong - TDWG 2013
DOIs and Publications http://biodiversitylibrary.org/page/2381678#page/363/mode/1up Heidorn and Chong - TDWG 2013
Checklist of Plants of Algonquin Park Specimen Literature Relationships http://www.ncbi.nlm.nih.gov/genbank/ See: DwC:associatedReferences Sequence? E75: Conceptual Object Appellation TaxonDescription E19: Biological Object lectotype E41: Appellation E75: Conceptual Object Appellation (DOI, ISBN Needed) P137: Exemplifies Scientific Name P149: is identified by E19: Biological Object E19: Biological Object E55:Type (Quercus alba L) P41: Classified P136: was based on P135: Created Type P41: Classified Dwc:Identification E83:Type Creation Dwc:Identification E17: Type Assignment Heidorn and Chong - TDWG 2013
Anyone can say anything about anything Museum Labels Semantic Information Store Heidorn and Chong - TDWG 2013 http://www.maskworld.com/english/products/make-up/--/monster-hands--640/ogre-feet--SP-3310-black http://www.clipartguide.com/_pages/0511-0810-0502-2909.html http://vector-magz.com/fashion/glass-slipper-clip-art-item-3/
Acknowledgments National Science Foundation BiSciCol Collaborators http://biscicol.blogspot.com/ iDigBioHackathon Participants Heidorn and Chong - TDWG 2013
Thank You!heidorn@email.arizona.edustevenchong@email.arizona.edu Heidorn and Chong - TDWG 2013