220 likes | 323 Views
Digital Preservation Tools for Repository Managers. A practical course in five parts presented by the KeepIt project in association with. Module 4, Putting storage, format management and preservation planning in the repository University of Southampton, 18-19 March 2010
E N D
Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting storage, format management and preservation planning in the repository University of Southampton, 18-19 March 2010 Twitter hashtag #dprc(digital preservation repository course)
Course structure • Module 1. Organisational issues Scoping, selection, assessment, institutional parameters (19 January) • Module 2. CostsLifecycle costs for managing digital objects, based on the LIFE approach, and institutional costs (5 February) • Module 3. Description Describing content for preservation: provenance, significant properties and preservation metadata (2 March) • Module 4. Preservation workflow tools available in EPrints for format management, risk assessment and storage, and linked to the Plato planning tool from Planets (TODAY) • Module 5. Trust (by others) of the repository’s approach to preservation; trust (by the repository) of the tools and services it chooses (30th March)
Tools this module • Eprints preservation apps, including the storage controller, Dave Tarrant and Adam Field, University of Southampton • Plato, preservation planning tool from the Planets project, Andreas Rauber and HannesKulovits, TU Wien
Steve Jobs launches Apple iPad Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/
Steve Jobs launches Apple iPad “75 million people already own iPod Touches and iPhones. That's all people who already know how to use the iPad.” Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/
Some revision from KeepIt Module 3 • Preservation workflow
Preservation workflow Check Analyse Action • Format identification, versioning • File validation • Virus check • Bit checking and checksum calculation • Tools • e.g. DROID • JHOVE • FITS Preservation planning Characterisation: Significant properties and technical characteristics, provenance, format, risk factors Risk analysis Tools Plato (Planets) PRONOM (TNA) P2 risk registry (KeepIt) INFORM (U Illinois) KB • Migration • Emulation • Storage selection
Format risks 1000 Ubiquity: degree of adoption of the format 1001 Support: number of tools available which can access the format 1002 Disclosure: extent to which the format documentation is publicly disclosed 1003 Document Quality: completeness of the available documentation 1004 Stability: speed and backwards-compatibility of version change 1005 Ease of identification: ease with which the format can be identified 1006 Ease of validation: ease with which the format can be validated 1007Lossiness: does the format use lossy compression 1008 Intellectual property rights: whether or not the format is encumbered by IPR 1009 Complexity: degree of content or behavioural complexity supported From PRONOM documentation (The National Archives), July 2008
A group task on format risks Choose two formats to compare (e.g. Word vs PDF, Word vs ODF, PDF vs XML, TIFF vs JPEG) By working through the (surviving) list of format risks select a winner (or a draw) between your chosen formats for each risk category (1 point for win) Total the scores to find an overall winning format Suggest one reason why the winning format using this method may not be the one you would choose for your repository
Some revision from KeepIt Module 3 • Preservation workflow • Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.
Some revision from KeepIt Module 3 • Preservation workflow • Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties
InSPECT SP Assessment Framework • Builds on Gero’sFunction-Behaviour-Structure framework • FBS developed to assist engineers/designers to create & redesign artefacts Three categories: • Function: The design intention or purpose that is performed. • Behaviour: The epistemological outcome derived from the function & structure obtained by the stakeholder • Structure: The structural elements of the Object that enables stakeholder to perform behaviour. • Artefact construction is product of designated function. • Behaviour is result of interaction between Function & Structure
Exercise overview • Analyse the content of an email • Analyse structure of email message • Determine purpose that each technical property performs • Consider how email will be used by stakeholders • Identify set of expected behaviours • Classify set of behaviours into functions for recording
Determine expected behaviours • What activities would a user – any type of stakeholder – perform when using an email? • Draw upon list of property descriptions performed in the previous step, formal standards and specifications, or other information sources. Task 2: Identify the type of actions that a user would be able to perform using the email (Groups. 15 mins). • E.g. Establish name of person who sent email • E.g. May want to confirm that email originated from stated source.
1.3 cont. Categories of properties Five high-level categories • Content e.g. character count • Context e.g. date of creation • Rendering e.g. bit depth • Structure e.g. e-mail attachments • Behaviour e.g. hyperlinks
• Identify Stakeholders • Creator – view, annotate • Researcher corresponds during research with colleagues, peers, administrators etc. • Recipient – reuses content • Student wants to understand research lifecycles by studying real-world practice • Custodian – evidential chain • Maintains permanent email record for externally-funded projects, alongside data and eprint outputs
Some revision from KeepIt Module 3 • Preservation workflow • Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties • We considered which characteristics might be significant using the function-behaviour-structure (FBS) framework, and classifying the functions of formatted emails • We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist
Some revision from KeepIt Module 3 • Preservation workflow • Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties • We considered which characteristics might be significant using the function-behaviour-structure (FBS) framework, and classifying the functions of formatted emails • We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist • Documentation • We looked at two means to document these characteristics, and the changes over time • Broad and established (PREMIS) • Focussed, and work-in-progress (Open Provenance Model)
Some revision from KeepIt Module 3 • Preservation workflow • Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties • We considered which characteristics might be significant using the function-behaviour-structure (FBS) framework, and classifying the functions of formatted emails • We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist • Documentation • We looked at two means to document these characteristics, and the changes over time • Broad and established (PREMIS) • Focussed, and work-in-progress (Open Provenance Model) • Provenance in action: transmission and recording
Provenance: a numbers game • Transmission: recording vs word-of-mouth • Identifying what is significant about the information to be transmitted • Can be self-correcting!
Some revision from KeepIt Module 3 • Preservation workflow • Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties • We considered which characteristics might be significant using the function-behaviour-structure (FBS) framework, and classifying the functions of formatted emails • We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist • Documentation • We looked at two means to document these characteristics, and the changes over time • Broad and established (PREMIS) • Focussed, and work-in-progress (Open Provenance Model) • Provenance in action: transmission and recording • Through a simple game we learned that if we don’t recognise the necessary properties at the outset, and maintain a record through all stages of transmission, the information at the end of the chain will likely not be the same as you started with