1 / 33

Digitised collections: Toward a digital strategy for f or the NHM, London

Digitised collections: Toward a digital strategy for f or the NHM, London. Vince Smith. Workshop 3, pro-iBiosphere, Berlin 23 May 2013. Digital Ambition: NHM Science Strategy 2013-2017. A New Voyage of Discovery Three Focal Areas 1. Scientific discovery 2. Scientific Infrastructure

callum
Download Presentation

Digitised collections: Toward a digital strategy for f or the NHM, London

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digitised collections: Toward a digital strategy for for the NHM, London Vince Smith Workshop 3, pro-iBiosphere, Berlin 23 May 2013

  2. Digital Ambition: NHM Science Strategy 2013-2017 A New Voyage of Discovery Three Focal Areas 1. Scientific discovery 2. Scientific Infrastructure 3. Scientific engagement Five Challenges 1. The Digital NHM 2. Origins, evolution & futures 3. Biodiversity discovery 4. Natural resources & hazards 5. Science, society & skills Resources & funding Measuring success

  3. data.nhm.ac.uk/globe/

  4. Digital Ambition: NHM Science Strategy 2013-2017 A New Voyage of Discovery Three Focal Areas 1. Scientific discovery 2. Scientific Infrastructure 3. Scientific engagement Five Challenges 1. The Digital NHM 2. Origins, evolution & futures 3. Biodiversity discovery 4. Natural resources & hazards 5. Science, society & skills Resources & funding Measuring success Scientific impact1,000 papers in leading journalsDigital access20M specimens available digitallyEngagement 1M face-to-face engagements CollectionsGlobally important collectionsDiagnostic toolsDiagnostic tools for key groupsDeep timeTimeline of key transitionsScience & societyArticulate of the role of science UK network Act as a national museum Earth sciences Earth Sciences CentreFunding £10M for Five Challenge Areas

  5. Overview • Existing digital content, sources & formats • Research data • Collections data • Making collections data digital • Priorities • Protocols & pathfinder activities • Crowdsourcing transcription • Aggregation & delivery • The NHM data portal • Data visualisation, data sub-portals • Identifiers, links & interoperability • DataCite DOIs • Third party aggregators • Portal API’s, download & analytical functions • Timeline & constraints • Data policies • Next steps Digitisation activities Data portal

  6. 1. Existing digital content NHM Research Outputs One Month of NHM Science group papers • 49 papers, 45 available online • (4 print only or behind pay walls) • 9 had supplementary data files • 39 papers with tables, charts & other data • >1000 sequences • 826 figures • 76 tables • 1 genome • No collective view of these data (37 journals) • No consistent way of citing NHM data • No consistent mechanism to access data • Effectively invisible at the institutional level Data via Carolyn Lowry e-mail, 13th Feb. 2013

  7. 1. Existing digital content NHM Collections Outputs: data • Huge investment in NHM collection management system • ≠ Imaging • Most research projects need spatio-temporal records • Different requirements for different purposes

  8. 1. Existing digital content NHM Collections Outputs: images • Many, many imaging projects (highly fragmented) • Circa 40 TB for major collections (excluding library) • 120,000 images in KE EMu (many others not in KE!) • Circa 250,000 via NHM Photo unit (limited metadata)

  9. 1. Existing digital content Current data formats • Darwin Core Archive (DwCA) & extensions (collections) • Circa 2020 fields mapped to 50 fields to generate archive • Images mainly JPG & TIFF • Metadata using EML & Genesis II standard • Research data files in a wide array of formats (blob files)

  10. 2. Making collections data digital Digitisation Priorities • Priorities linked to science strategic priorities • Disease, sustainability, crop wild relatives, pests etc. • Tiered approach, different needs for different collections • Low hanging fruit (2D objects e.g. herb. sheets & slides)

  11. 2. Making collections data digital Digitisation Priorities • Priorities linked to science strategic priorities • Disease, sustainability, crop wild relatives, pests etc. • Tiered approach, different needs for different collections • Low hanging fruit (2D objects e.g. herb. sheets & slides) • Linked to strategic collaborations & financial opportunities • e.g RBG Kew, RBG Edinburgh, Nat. Mum. Wales, Hunterian etc. • Priorities dictate order – we plan to do it all (eventually)!

  12. 2. Making collections data digital Digitisation Protocols • Exercise to develop digitisation protocols across collection • Slides, spirit, herbarium sheets, pinned, multispecimen/drawer • Protocols mapped to high level collections descriptions • Workflow software supporting rapid digitisation (to KE & DAMS)

  13. 2. Making collections data digital Digitisation Protocols • Exercise to develop digitisation protocols across collection • Slides, spirit, herbarium sheets, pinned, multispecimen/drawer • Protocols mapped to high level collections descriptions • Workflow software supporting rapid digitisation (to KE & DAMS) • Pathfinder activities for less well understood projects • Entomological dry material (30 M specimens) • iCollections (specimen-by-specimen) approach • SatScan (drawer level multi-specimen) approach

  14. 2. Making collections data digital iCollections Initiative • Specimen-by-specimen, traditional, dedicated 6 person team • Digitising British Isles Lepidoptera collection • ~500,000 specimens, 5,000 drawers • Re-curation & specimen imaging • Complete label information including georeferencing • For use in Climate Change initiative

  15. 2. Making collections data digital iCollections Initiative • 4-6 people over 3 years, work broken into small tasks by teams • Average imaging rate 163 specimen/day*person • Averaging >3min per specimen (prep., imaging & databasing) • >£1/specimen • BUT: 6,800 person years for the entire collection

  16. 2. Making collections data digital SatScan Initiative • Drawer level digitisation, segmented down to specimens • Very fast imaging, no specimen handling, just one view • No label information, but some data extracted from drawer • Specimens retrospectively cropped & annotated

  17. 2. Making collections data digital SatScan Initiative • Drawer level digitisation, segmented down to specimens • Very fast imaging, no specimen handling, just one view • No label information, but some data extracted from drawer • Specimens retrospectively cropped & annotated

  18. 2. Making collections data digital SatScan Initiative • Dedicated specimen-level rapid annotation software

  19. 2. Making collections data digital Crowdsourcing & Transcription • We have a massive transcription problem • Experiments via Notes-from-Nature (a Zooniverse project) • Transcribing the NHM ornithological accession registers • Wikimedian in Residence (Wikisource transcription) • 4 Month project, including specimen label transcription

  20. 3. Aggregation & Delivery NHM Data Portal • data.nhm.ac.uk • A focus for deposition and discovery of major NHM data sets • Promote innovation though re-use of museum data • Open Access, at a dedicated subdomain of the NHM website • Started Jan. 2013 (3 years), consultation throughout 2012 Functional components of the data portal

  21. 3. Aggregation & Delivery NHM Data Portal: Registry • Dataset registry, for dataset discovery, modeled on data.gov.uk • Uses CKAN, an open-source data portal software platform Results Search Browse & search criteria Datasets matching criteria Advanced display options Individual dataset

  22. 3. Aggregation & Delivery NHM Data Portal: Registry • Dataset metadata discovery License Name Authors Tags Download Metadata about the dataset Technical Info. (extracted from data file) Geographic scope Developer tools “Social”

  23. 3. Aggregation & Delivery NHM Data Portal: Dataset upload • Simple datasets upload workflow for non-collections data 1. Name the dataset 2. Upload / link the data file 3. Describe the data file 4. Theme & tag 5. Add additional resources 6. Temporal coverage 7. Geographic coverage 8. Save & finish

  24. 3. Aggregation & Delivery NHM Data Portal: Data visualisation • Dedicated interface to visualise & explore major datasets • Focused on collections data, based on Canadensys.net, uses CartoDB Toggle map, table & stats views Search, download & display options No. records No. Georef. records Applied filters Zoomable map

  25. 3. Aggregation & Delivery NHM Data Portal: Data visualisation Collections views Specimen record views Tables Statistical summary Full record Summary preview Data field mappings Download

  26. 4. Identifiers, links & interoperability NHM Data Portal & DataCite • Using DataCite DOIs in the data portal • datasets (2014) & specimens (2015) • Unique, persistent and resolvable identifiers • Easy to cite, alias existing specimen identifiers • Conform to minimum DataCite requirements • Landing page, min. metadata standard, fee, min. 10 yr. contract, DOI (pre)fixes Breaks us out of the biodiversity data silo

  27. 4. Identifiers, links & interoperability Data Aggregation, APIs & download • Content within the NHM data portal will be highly accessible • Collections harvestable (e.g. by GBIF as a DwCA) • Download DwCAs on any search facet • Wide set of API’s available of datasets (part of CKAN) • Sub-portals (selected content, themed by topic) • e.g Virtual Herbarium, NHM Science initiatives, geographic regions • Analytical interface planned for 2015 (but not specified)

  28. 5. Timeline & constraints Data Policies & Next Steps • Data portal will be “open-by-default” • Ambiguity in what this means & top down schizophrenia • Conflicting mandates on open access & revenue opportunities • Lots of guidance available, will use to form a common policy • A cross institutional policy would be useful (but challenging)

  29. 5. Timeline & constraints Data Policies & Next Steps NHM Data portal timeline • Next 6 months • More documentation (PID and Tech Spec) • Consultation and advocacy (internal and external) • Data mapping from KE EMu and software testing • Development • website wireframe design • drafting data visualisation subcontract • Construction of private alpha release Requirements & dataset discovery Internal feedback, data visualisation & DOIs Subportals & analytical tools Jan 2015 Jan 2014 Jan 2013 Jan 2016 Full release & sub-portals Project start Private alpha Stable public beta

  30. 5. Timeline & constraints Data Policies & Next Steps NHM digitisation timeline Path-finding & Programme development Major funding applications & a new gallery? Digitisie… Digitisie… Digitisie… 2014 2015 2016 2017 Jan 2013 2018 20 Million!! Project start Private alpha Stable public beta • Next 6 months • Initial conclusions from path-finding digitisation activities • Initial grant funding bids developed • Advocacy, outreach & development of a digitisation “programme” • Investigate possibilities for gallery development • Develop crowdsourcing strategy

  31. QUESTIONS

  32. 2. Making collections data digital Digitisation Priorities • Priorities linked to science strategic priorities • Disease, sustainability, crop wild relatives, pests etc.

  33. 2. Making collections data digital Digitisation Priorities • Priorities linked to science strategic priorities • Disease, sustainability, crop wild relatives, pests etc. • Tiered approach, different needs for different collections Nick Poole, UK Collections Trust

More Related