1 / 41

The DCC Curation Lifecycle Model

The DCC Curation Lifecycle Model. Sarah Higgins Ross Harvey with graphics advice from Chris Blackall. The Curation Lifecycle

seanh
Download Presentation

The DCC Curation Lifecycle Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The DCC Curation Lifecycle Model Sarah Higgins Ross Harvey with graphics advice from Chris Blackall

  2. The Curation Lifecycle The DCC Curation Lifecycle Model provides a graphical high level overview of the stages required for successful curation and preservation of data from initial conceptualisation or receipt. The model can be used to plan activities within an organisation or consortium to ensure all necessary stages are undertaken, each in the correct sequence.

  3. Using the DCC Curation Lifecycle Model The model enables: • mapping of granular functionality • definition of roles and responsibilities • building frameworks of standards and technologies to implement • identification of additional steps required • identification of actions which are not required • ensuring adequate documentation of processes and policies

  4. Data (Digital Objects or Databases) Data, any information in binary digital form, is at the centre of the Curation Lifecycle. This includes: • simple digital objects • complex digital objects • databases

  5. Data (Digital Objects or Databases) • simple digital objects • discrete digital items, such as textual files, images or sound files, along with their related identifiers and metadata • complex digital objects • discrete digital objects, made by combining a number of other digital objects, such as websites • databases • structured collections of records or data stored in a computer system

  6. Full Lifecycle Actions Description and Representation Information Assign administrative, descriptive, technical, structural and preservation metadata, using appropriate standards, to ensure adequate description and control over the long-term. Collect and assign representation information required to understand and render both the digital material and the associated metadata.

  7. Full Lifecycle Actions Description Information (Metadata) • persistently identifies data and maintains reliable links to them • clearly describes what they are • clearly identifies technical information needed to use data • identifies who is responsible for their management and preservation • describes what can be done to them • describes what is needed to represent them at the required level of fidelity • records their history and documents their authenticity • allows users to understand their context and relationship to other objects.

  8. Full Lifecycle Actions Representation Information • Structure Information: describes the format and data structure concepts to be applied to the bitstream, which result in more meaningful values like characters or number of pixels. • Semantic Information: this is needed on top of the structure information. If the digital object is interpreted by the structure information as a sequence of text characters, the semantic information should include details of which language is being expressed. • Other Representation Information: includes information about relevant software, hardware and storage media, encryption or compression algorithms, and printed documentation.

  9. Full Lifecycle Actions Preservation Planning Plan for preservation throughout the curation lifecycle of digital material. This would include plans for management and administration of all curation lifecycle actions.

  10. Full Lifecycle Actions Preservation Planning – ensure future data access Digital preservation: • is a set of managed activities • aims at ensuring the bit-streamis maintained • aims at ensuring that data are accessible • is concerned with maintaining bit streams and ensuring accessibility for a definable period oftime

  11. Full Lifecycle Actions Preservation Planning– ensure longevity, integrity, accessibility • longevity • as long as required - longer than the original access system • integrity • copy data to a reliable digital storage system • ongoing management - data security, backups, error checking • refresh data and maintain multiple copies of the bit stream • ensure you have preservation action rights. • accessibility • assign persistent identifiers • add sufficient metadata and representation information • choose limited open file formats • monitor technical developments • retain and manage the original bit stream

  12. Full Lifecycle Actions Community Watch and Participation Maintain a watch on appropriate community activities, and participate in the development of shared standards, tools and suitable software.

  13. Full Lifecycle Actions Community Watch and Participation – benefits of collaboration • access to a wider range of expertise • access to tools and systems that might otherwise be unavailable • encouragement for other stakeholders to take preservation seriously • shared influence on R&D of standards and practices • attraction of resources and other support for well-coordinated programmes at a regional, national or sectoral level • shared influence on agreements with producers • increased coverage of preserved materials • better planning to reduce wasted effort • shared development costs • shared learning opportunities UNESCO, Guidelines for the Preservation of Digital Heritage, 2003

  14. Full Lifecycle Actions Curate and Preserve Be aware of, and undertake management and administrative actions planned to promote curation and preservation throughout the curation lifecycle.

  15. Sequential Actions Curate and Preserve – the need for digital curationand preservation • immense quantities of data are being generated • the quantities are increasing • the scientific, scholarly and research communities increasingly rely on networked computing • data are at risk from: • technological obsolescence • fragility • lack of understanding / application of a good practice • insufficient resources • inappropriate organisational infrastructure

  16. Sequential Actions Curate and Preserve - plan for digital curation and preservation • Digital curation techniques address the problems outlined, including: • maintenance of data • adding value to data for current and future use • Be aware of management and administrative actions needed to promote curation and preservation throughout the lifecycle • Undertake management and administrative actions needed to promote curation and preservation throughout the lifecycle

  17. Sequential Actions Conceptualise Conceive and plan the creation of data, including capture method and storage options.

  18. Sequential Actions Conceptualise -plan with digital curation in mind • develop robust workflow, processes and documentation • choose appropriate, existing open standards - interoperability • capture and store data in curation-friendly file formats (open source) • record sufficient information during data capture to assist with ongoing use • scrupulously identify files • store data on appropriate media • identify a safe place for storage (e.g. a trusted archive) and make sure that archive will take your data • identify access methods • identify legal framework

  19. Sequential Actions Create or Receive Create data including administrative, descriptive, structural and technical metadata. Preservation metadata may also be added at the time of creation. Receive data, in accordance with documented collecting policies, from data creators, other archives, repositories or data centres, and if required assign appropriate metadata.

  20. Sequential Actions Create or Receive – ensure data are curation ready • of high quality • well structured • adequately documented • interoperable • authentic (it is what it claims to be) • accurate (it hasn’t been tampered with) • renderable (it can be used in the ways for which it was intended, or viewed as originally intended) • in a form that best ensures its longevity

  21. Sequential Actions Appraise and Select Evaluate data and select for long-term curation and preservation. Adhere to documented guidance, policies or legal requirements.

  22. Sequential Actions Appraise and Select – develop robust policies How long do we want to keep the data? • in terms of changes of technology • in terms an organisation’s business requirements • in terms of user requirements (e.g. as evidence to verify conclusions derived from research). How long do we need to keep the data? • assess benefits and risks of keeping/not keeping data • what are the consequences of not keeping the data? • how much would it cost to recreate it in the future? • is it even possible to recreate it in the future?

  23. Occasional Actions Dispose Dispose of data, which has not been selected for long-term curation and preservation in accordance with documented policies, guidance or legal requirements. Typically data may be transferred to another archive, repository, data centre or other custodian. In some instances data is destroyed. The data’s nature may, for legal reasons, necessitate secure destruction.

  24. Occasional Actions Dispose – transfer or destruction? • transfer • if no longer relevant for business function but useful to someone else • for safe keeping – institutional archive • for greater accessibility – more widely accessible data archive • secure destruction – prevent re-use or reconstruction • sensitive data no longer relevant for business function

  25. Sequential Actions Ingest Transfer data to an archive, repository, data centre or other custodian. Adhere to documented guidance, policies or legal requirements.

  26. Sequential Actions Ingest – prepare data for access and long-term storage • assign a persistent identifier • check the data does not contain malicious spyware or malware • creating fixity values eg: digital signature, hash value, checksum) for integrity checking • confirm technical details eg: file format, MIME type • associate with description and representation information

  27. Sequential Actions Preservation Action Undertake actions to ensure long-term preservation and retention of the authoritative nature of data. Preservation actions should ensure that data remains authentic, reliable and usable while maintaining its integrity. Actions include data cleaning, validation, assigning preservation metadata, assigning representation information and ensuring acceptable data structures or file formats.

  28. Sequential Actions Preservation Action – specific necessary actions • keep the original data bit stream as well as any ‘preservation version’ for future proofing • clean and validate data, to ensure they can be managed and re-used over time • add or extract high quality preservation metadata and representation informationto increase potential for discovery, re-use and preservation • ensure acceptable data structures or file formats (egnon-proprietary, well-documented) to increase the chance of future recoverability • apply good data management practices • implement secure storage and institutional or organisational continuity Based on Lord, P and Macdonald, A, eScience Curation Report, 2003

  29. Sequential Actions Preservation Action – implement preservation methods • Migration – transform formats as technologies change • Emulation – keep original data and application software and create programs to emulate their behaviour on contemporary architectures • Formal descriptions –encode behaviours of original application, at creation, in a format understood by a Universal Virtual Computer (a platform independent layer between hardware and software)to allow reconstitution in original form. • Digital archaeology – future recovery as needed or exploratory basis • Computer museums – archive whole systems: hardware and software Based on Lord, P and Macdonald, A, eScience Curation Report, 2003

  30. Sequential Actions Preservation Action – automate with tools • identifying data (where it is located, what formats it is in) • format validation, format registries, obsolescence tools • describing data (automated metadata creation) • technical metadata extraction, conversion to xml schema • manipulating data (data management, data storage, repositories) • normalising and encapsulation tools • preserving data (migration) • web archiving tools, emulation tools, preservation metadata extraction tools • data registration (ingest) • documentation of commonly used terms and concepts • thesauri, word lists, ontologies • rights management and access control

  31. Occasional Actions Reappraise Return data which fails validation procedures for further appraisal and reselection.

  32. Occasional Actions Migrate Migrate data to a different format. This may be done to accord with the storage environment or to ensure the data’s immunity from hardware or software obsolescence.

  33. Occasional Actions Migrate– for preservation storage • File formats for long-term preservation should be: non-proprietary, open source and well documented • This facilitates: curation, future access, reuse and future migrations Examples • JPEG – digital image thumbnails • TIFF – high quality digital images • PDF/A-1 – documents – with look and feel (ISO 19005-1, Document management – electronic document file formats for long-term preservation) • HTML – web pages • XML – data or text

  34. Sequential Actions Store Store the data in a secure manner adhering to relevant standards.

  35. Sequential Actions Store – ensure access and continuity • storage facilities should:     • ensure secure and reliable storage over time • meet the requirements of relevant standards • access for use and reuse • storage administration should: • be committed to continued maintenance of digital objects • ensure adequate and appropriate finance and staffing • negotiate the requisite contractual and legal rights • fulfil legal responsibilities • develop an effective and efficient policy framework • develop a strategic program for preservation planning and action

  36. Sequential Actions Access, Use and Reuse Ensure that data is accessible to both designated users and re-users, on a day-to-day basis. This may be in the form of publicly available published information. Robust access controls and authentication procedures may be applicable.

  37. Sequential Actions Access, Use and Reuse – ensure access and continuity • ensuring data can be discovered by applying standards • metadata standards • allow interoperability • ensure legal permissions allow data to be used and reused • ensure legal restrictions on the use and reuse of data are adhered to • provide tools for collaboration • ensure access controls and authentication procedures restrict access to authorised users

  38. Sequential Actions Transform Create new data from the original, for example • By migration into a different format. • By creating a subset, by selection or query, to create newly derived results, perhaps for publication.

  39. Sequential Actions Transform – new uses for curated data • verification of post-analysis results • the basis of further experiments • cumulative analysis. • foundation for new research, science, knowledge and discovery • reliable extension of research The curation lifecycle begins again: new data created by the Transform action is input into the Create or Receive action.

More Related