330 likes | 556 Views
Digitised collections: Toward a digital strategy for f or the NHM, London. Vince Smith. Workshop 3, pro-iBiosphere, Berlin 23 May 2013. Digital Ambition: NHM Science Strategy 2013-2017. A New Voyage of Discovery Three Focal Areas 1. Scientific discovery 2. Scientific Infrastructure
E N D
Digitised collections: Toward a digital strategy for for the NHM, London Vince Smith Workshop 3, pro-iBiosphere, Berlin 23 May 2013
Digital Ambition: NHM Science Strategy 2013-2017 A New Voyage of Discovery Three Focal Areas 1. Scientific discovery 2. Scientific Infrastructure 3. Scientific engagement Five Challenges 1. The Digital NHM 2. Origins, evolution & futures 3. Biodiversity discovery 4. Natural resources & hazards 5. Science, society & skills Resources & funding Measuring success
Digital Ambition: NHM Science Strategy 2013-2017 A New Voyage of Discovery Three Focal Areas 1. Scientific discovery 2. Scientific Infrastructure 3. Scientific engagement Five Challenges 1. The Digital NHM 2. Origins, evolution & futures 3. Biodiversity discovery 4. Natural resources & hazards 5. Science, society & skills Resources & funding Measuring success Scientific impact1,000 papers in leading journalsDigital access20M specimens available digitallyEngagement 1M face-to-face engagements CollectionsGlobally important collectionsDiagnostic toolsDiagnostic tools for key groupsDeep timeTimeline of key transitionsScience & societyArticulate of the role of science UK network Act as a national museum Earth sciences Earth Sciences CentreFunding £10M for Five Challenge Areas
Overview • Existing digital content, sources & formats • Research data • Collections data • Making collections data digital • Priorities • Protocols & pathfinder activities • Crowdsourcing transcription • Aggregation & delivery • The NHM data portal • Data visualisation, data sub-portals • Identifiers, links & interoperability • DataCite DOIs • Third party aggregators • Portal API’s, download & analytical functions • Timeline & constraints • Data policies • Next steps Digitisation activities Data portal
1. Existing digital content NHM Research Outputs One Month of NHM Science group papers • 49 papers, 45 available online • (4 print only or behind pay walls) • 9 had supplementary data files • 39 papers with tables, charts & other data • >1000 sequences • 826 figures • 76 tables • 1 genome • No collective view of these data (37 journals) • No consistent way of citing NHM data • No consistent mechanism to access data • Effectively invisible at the institutional level Data via Carolyn Lowry e-mail, 13th Feb. 2013
1. Existing digital content NHM Collections Outputs: data • Huge investment in NHM collection management system • ≠ Imaging • Most research projects need spatio-temporal records • Different requirements for different purposes
1. Existing digital content NHM Collections Outputs: images • Many, many imaging projects (highly fragmented) • Circa 40 TB for major collections (excluding library) • 120,000 images in KE EMu (many others not in KE!) • Circa 250,000 via NHM Photo unit (limited metadata)
1. Existing digital content Current data formats • Darwin Core Archive (DwCA) & extensions (collections) • Circa 2020 fields mapped to 50 fields to generate archive • Images mainly JPG & TIFF • Metadata using EML & Genesis II standard • Research data files in a wide array of formats (blob files)
2. Making collections data digital Digitisation Priorities • Priorities linked to science strategic priorities • Disease, sustainability, crop wild relatives, pests etc. • Tiered approach, different needs for different collections • Low hanging fruit (2D objects e.g. herb. sheets & slides)
2. Making collections data digital Digitisation Priorities • Priorities linked to science strategic priorities • Disease, sustainability, crop wild relatives, pests etc. • Tiered approach, different needs for different collections • Low hanging fruit (2D objects e.g. herb. sheets & slides) • Linked to strategic collaborations & financial opportunities • e.g RBG Kew, RBG Edinburgh, Nat. Mum. Wales, Hunterian etc. • Priorities dictate order – we plan to do it all (eventually)!
2. Making collections data digital Digitisation Protocols • Exercise to develop digitisation protocols across collection • Slides, spirit, herbarium sheets, pinned, multispecimen/drawer • Protocols mapped to high level collections descriptions • Workflow software supporting rapid digitisation (to KE & DAMS)
2. Making collections data digital Digitisation Protocols • Exercise to develop digitisation protocols across collection • Slides, spirit, herbarium sheets, pinned, multispecimen/drawer • Protocols mapped to high level collections descriptions • Workflow software supporting rapid digitisation (to KE & DAMS) • Pathfinder activities for less well understood projects • Entomological dry material (30 M specimens) • iCollections (specimen-by-specimen) approach • SatScan (drawer level multi-specimen) approach
2. Making collections data digital iCollections Initiative • Specimen-by-specimen, traditional, dedicated 6 person team • Digitising British Isles Lepidoptera collection • ~500,000 specimens, 5,000 drawers • Re-curation & specimen imaging • Complete label information including georeferencing • For use in Climate Change initiative
2. Making collections data digital iCollections Initiative • 4-6 people over 3 years, work broken into small tasks by teams • Average imaging rate 163 specimen/day*person • Averaging >3min per specimen (prep., imaging & databasing) • >£1/specimen • BUT: 6,800 person years for the entire collection
2. Making collections data digital SatScan Initiative • Drawer level digitisation, segmented down to specimens • Very fast imaging, no specimen handling, just one view • No label information, but some data extracted from drawer • Specimens retrospectively cropped & annotated
2. Making collections data digital SatScan Initiative • Drawer level digitisation, segmented down to specimens • Very fast imaging, no specimen handling, just one view • No label information, but some data extracted from drawer • Specimens retrospectively cropped & annotated
2. Making collections data digital SatScan Initiative • Dedicated specimen-level rapid annotation software
2. Making collections data digital Crowdsourcing & Transcription • We have a massive transcription problem • Experiments via Notes-from-Nature (a Zooniverse project) • Transcribing the NHM ornithological accession registers • Wikimedian in Residence (Wikisource transcription) • 4 Month project, including specimen label transcription
3. Aggregation & Delivery NHM Data Portal • data.nhm.ac.uk • A focus for deposition and discovery of major NHM data sets • Promote innovation though re-use of museum data • Open Access, at a dedicated subdomain of the NHM website • Started Jan. 2013 (3 years), consultation throughout 2012 Functional components of the data portal
3. Aggregation & Delivery NHM Data Portal: Registry • Dataset registry, for dataset discovery, modeled on data.gov.uk • Uses CKAN, an open-source data portal software platform Results Search Browse & search criteria Datasets matching criteria Advanced display options Individual dataset
3. Aggregation & Delivery NHM Data Portal: Registry • Dataset metadata discovery License Name Authors Tags Download Metadata about the dataset Technical Info. (extracted from data file) Geographic scope Developer tools “Social”
3. Aggregation & Delivery NHM Data Portal: Dataset upload • Simple datasets upload workflow for non-collections data 1. Name the dataset 2. Upload / link the data file 3. Describe the data file 4. Theme & tag 5. Add additional resources 6. Temporal coverage 7. Geographic coverage 8. Save & finish
3. Aggregation & Delivery NHM Data Portal: Data visualisation • Dedicated interface to visualise & explore major datasets • Focused on collections data, based on Canadensys.net, uses CartoDB Toggle map, table & stats views Search, download & display options No. records No. Georef. records Applied filters Zoomable map
3. Aggregation & Delivery NHM Data Portal: Data visualisation Collections views Specimen record views Tables Statistical summary Full record Summary preview Data field mappings Download
4. Identifiers, links & interoperability NHM Data Portal & DataCite • Using DataCite DOIs in the data portal • datasets (2014) & specimens (2015) • Unique, persistent and resolvable identifiers • Easy to cite, alias existing specimen identifiers • Conform to minimum DataCite requirements • Landing page, min. metadata standard, fee, min. 10 yr. contract, DOI (pre)fixes Breaks us out of the biodiversity data silo
4. Identifiers, links & interoperability Data Aggregation, APIs & download • Content within the NHM data portal will be highly accessible • Collections harvestable (e.g. by GBIF as a DwCA) • Download DwCAs on any search facet • Wide set of API’s available of datasets (part of CKAN) • Sub-portals (selected content, themed by topic) • e.g Virtual Herbarium, NHM Science initiatives, geographic regions • Analytical interface planned for 2015 (but not specified)
5. Timeline & constraints Data Policies & Next Steps • Data portal will be “open-by-default” • Ambiguity in what this means & top down schizophrenia • Conflicting mandates on open access & revenue opportunities • Lots of guidance available, will use to form a common policy • A cross institutional policy would be useful (but challenging)
5. Timeline & constraints Data Policies & Next Steps NHM Data portal timeline • Next 6 months • More documentation (PID and Tech Spec) • Consultation and advocacy (internal and external) • Data mapping from KE EMu and software testing • Development • website wireframe design • drafting data visualisation subcontract • Construction of private alpha release Requirements & dataset discovery Internal feedback, data visualisation & DOIs Subportals & analytical tools Jan 2015 Jan 2014 Jan 2013 Jan 2016 Full release & sub-portals Project start Private alpha Stable public beta
5. Timeline & constraints Data Policies & Next Steps NHM digitisation timeline Path-finding & Programme development Major funding applications & a new gallery? Digitisie… Digitisie… Digitisie… 2014 2015 2016 2017 Jan 2013 2018 20 Million!! Project start Private alpha Stable public beta • Next 6 months • Initial conclusions from path-finding digitisation activities • Initial grant funding bids developed • Advocacy, outreach & development of a digitisation “programme” • Investigate possibilities for gallery development • Develop crowdsourcing strategy
2. Making collections data digital Digitisation Priorities • Priorities linked to science strategic priorities • Disease, sustainability, crop wild relatives, pests etc.
2. Making collections data digital Digitisation Priorities • Priorities linked to science strategic priorities • Disease, sustainability, crop wild relatives, pests etc. • Tiered approach, different needs for different collections Nick Poole, UK Collections Trust