260 likes | 282 Views
Explore existing data curation options for digital repositories to enhance data preservation and reuse. Learn about institutional and specialized data repositories and key models guiding data lifecycle management. Gain insights on capability maturity models for scientific data management and digital preservation. Identify best practices for data curation and preservation to ensure long-term usability.
E N D
2015 DLF eResearch Network Webinar 5: Digital Repositories: Existing Data Curation Options September 23, 2015
Today’s Webinar • Guest Speaker • Q & A • Housekeeping • Tweet about it! #eresearchnetwork @CLIRDLF
Nancy McGovern Head of Curation and Preservation Services, MIT Libraries
Theme: Digital Repositories: Existing Data Curation Options; Institutional and Specialized Data Repositories Topic: Nancy Y McGovern, MIT Libraries E-research webinar – 23 Sept 2015
I2S2 IdealisedSceintificResearch Activity Lifecycle Model (2011)
Research360 Institutional Research Lifecycle (2011) • six-stages • : high-level summary of the I2S2 model • resembles UKDA Lifecycle model • except – • additional planning stage at the start no processing stage
UK Data Archive Data Lifecycle (2012?) 1. Creating data 2. Processing data 3. Analysing data 4. Preserving data 5. Giving access to data 6. Re-using data
ANDS Data Sharing Verbs (2009) Create Store Describe Identify Register Discover Access Exploit
DataONE Data Lifecycle (2012) 1. Plan 2. Collect 3. Assure 4. Describe 5. Preserve 6. Discover 7. Integrate 8. Analyze
Capability Maturity Model for Scientific Data Management (2011) • Data acquisition, processing and quality assurance [audit] • Goal: Reliably capture and describe scientific data in a way that facilitates preservation and reuse • Data description and representation • Goal: Create quality metadata for data discovery, preservation, and provenance functions. • Data dissemination • Goal: Design and implement interfaces for users to obtain and interact with data • Repository services/preservation [preserve?] • Goal: Preserve collected data for long-term use
Capability Maturity Model for Scientific Data Management (2) G2.1 The organization establishes policies for planning and performing the process G2.2 A data management plan is established and maintained G2.3 Resources are provided G2.4 Responsibility is assigned G2.5 People are trained G2.6 Work products are controlled G2.7 Stakeholders are identified [collaboration] G2.8 The process is monitored and controlled [assess] G2.9 Adherence to process standards is assessed and noncompliance addressed [enforce]
Digital Preservation Management (Maturity) Model (2003) • Acknowledge • Act • Consolidate • Institutionalize • Externalize
Validation Phase Transfer Phase PAIMAS 2005 Develop agreement (data to be delivered, schedule, complementary elements,) Define the information to be archived Validate the transferred objects Actual transfer of the objects Phase objective Transferred object files Preliminary Phase Formal Definition Phase Dictionary Formal model Submission Agreement Anomalies Preliminary Agreement Validation agreement OAIS 2012: “the more specific Producer-Archive Interface Specification (PAIS) is under preparation” AIP Creation
DPOE Life Cycle Stages Identify - what digital content do we have? [DCM: overviews] Select - what portion will be preserved? [DCM: reviews] Store - issues are for long term storage? [CPS project] Protect – steps needed to protect digital content? [DCM: reviews] Manage - provisions for long-term management? [DCM: all] Provide - considerations for long-term access? [DCM: reviews]
NDSA Levels of DigitalPreservation Columns One: Protect Your Data Two: Know Your data Three: Monitor Your Data Four: Repair Your Data Rows Storage and Geographic Location File Fixity and Data Integrity Information Security Metadata File Formats
Data Scope • What data services are needed? • What is in scope for data services? Who’s asking? • Will we host/preserve data? Under what circumstances? • What support for DMPs? Follow up? • Research data + archives and special collections? • What data (studies, versions) should be retained? • Library data as secondary data?
Data Repository Options Manage and/or • Dryad • DSpace • Figshare • Fedora (SIdora) • Dataverse … Deposit • PubMed Central • Domain repositories …
your questions • Are there US institutions of higher education that have achieved a high level of maturity? • …In practice, does an IHE -- in which there are a variety of active research environments -- chose one model, overlay a number of these approaches or work simultaneously with different models? • … how does an institution determine how long to keep data? • …For those doing data management, are criteria such as initial, managed, and defined levels of maturity used in practice? • As we begin to more fully explore RDM services, is there a starting point where many libraries begin? …
your questions DPM work: • it fits into the last part of a larger data management cycle (several …) • it is a standalone "lifecycle" (see the DPOE Curriculum) that helps organizations deal with pre-existing data, collected without the guidance / instruction of a formal data management plan… a. Changes in technology have outpaced preservation strategies, and probably will continue to do so for the foreseeable near future... Will there ever be a point at which preservation and access are "baked in" to new technology development? What would it take to get there? …out of the control (to a large extent) of libraries and information professionals …can we play a role in terms of informing those researchers and/or software developers?
your questions b. Looking at the number of life cycle models in Ball's article, it feels to me that we will not only need a "Data Curation Profile" for research data, but that we will need a questionnaire or something similar for choosing a data life cycle model to use for data curation. Are data life cycle models being envisioned with the end data in mind (which may include things like methods or protocols, and not just outputs), or with a discipline in mind, or with a funding agency in mind? Or all three to a varying extent? b1. ...for example, NIH is developing The Commons (https://datascience.nih.gov/commons) ... Where in the "circular lifecycle" model does the change need to be interjected? ...
THANK YOU! • Webinar 6: October 14, 1:00-2:30 PM EST – “Services Evaluation: Planning and Conducting Evaluations of Your Own” • Consultations • DLF Forum (Oct. 26-28) • In-person meeting 10/28 • Google group • CLIR Connect