1 / 26

Developing data management expertise at King’s College London Experience of the PEKin project

Developing data management expertise at King’s College London Experience of the PEKin project. Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould, Archives & Information Management (AIM). Overview. Aims & objectives of PEKin project Project methodology

adair
Download Presentation

Developing data management expertise at King’s College London Experience of the PEKin project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing data management expertise at King’s College LondonExperience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould, Archives & Information Management (AIM)

  2. Overview • Aims & objectives of PEKin project • Project methodology • Findings on current state of data management • Action taken to address issues • Further work to be performed • Lessons learnt • Potential for reuse of project deliverables

  3. What is PEKin? • Title: Preservation Exemplar at King’s (PEKin) • Funder: JISC, Preservation strand of Information Environment 09-11 • Time period: 1 April 09 – 31 October 10 • Project partners: • Centre for e-Research (CeRch) • Archives & Information Management (AIM) • Based at King’s College London

  4. What is a Digital Record? “Recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity “ International Committee on Archives (ICA)

  5. New archiving challenges • Changing state of digital information: • Changing notion of what constitutes a record of business: • Core business: student information, committees, estates, etc. • Increasingly research outputs (data, papers) – funder requirements • Changing composition: • Born digital content (static and dynamic resourcees) • Hybrid (paper+digital), digital only • Lifecycles: • Creation process: Create, revise, publish 1st version, revise, publish 2nd version. Repeat. • Access lifecycle: Technology dependencies (hardware & software) • Implications: • Archival process: Archive at earlier stage? Capture using different technologies? • Data value: Can we be sure that everything has business value?

  6. Methodology • Evaluate existing information management procedures and working practices at institutional level and revise accordingly • What remains viable? • Elements that require revision • Gaps and omissions • Determine the data management needs that data producers and systems managers in academic units/professional services encounter and determine most effective approach to address requirements • Implement a technical system capable of curating and preserving digital records of long-term archival value.

  7. Review of existing frameworks • Reviewed DAF, DRAMBORA & DIRKS, etc. All req. further refinement to apply to own situation. • DAF: Data Asset/Audit Framework • + Useful for gathering detailed information on data assets located in departments • + Useful for analysing data management practices • Time-consuming to perform • Does not provide a method of evaluating problems & developing a mitigation strategy • DRAMBORA • + Provides formal structure for identifying, describing & evaluating risks & developing a strategy to mitigate or avoid them. • + well-defined list of risk categories and factors • - Intended for OAIS-like environments rather than less formalised research ‘systems’ • Focus upon OAIS workflow rather than data creation lifecycle

  8. Integrating frameworks • DAF & DRAMBORA are broadly similar, but some work needed: • Normalised terminology and definitions & adopted some archival terminology • Activity classification: Activities placed in diff. categories in DRAMBORA & DAF. • “Light touch” approach - establish balance between DRAMBORA system-level & DAF asset level analysis • high-level analysis of data assets using DAF • Omitted various DRAMBORA risk categories unrelated to data management • Adopted e-Research lifecycle model • Stages were tied-in with distinct project outputs

  9. Audit Framework

  10. Administrative Case Studies • Business departments & content types examined: • Committee: Council, Academic Board and sub-committees • Estates: Project & operational records • Student: Records held outside the Student system (SITS) • Archival value digital records • Mapped to current College paper holdings

  11. Research Case Studies • Research groups/projects/departments examined: • Environmental Research Group (ERG) • Environment Monitoring group • Environment Modelling group • Twins Early Development Study (TEDS) • Regional Information Collection Centre (RICC) • Period of change – since April 2010, IT provided centrally with storage provision review underway • Archives have previously ly accepted pioneering research data in past • Acquisition policy is now under review for born digital/digitised records

  12. Administrative study findings • Opportunity to redefine collections • All areas required digital records management support before archives could be identified • Quality control varied between records • Duplication with paper and born-digital versions retained • Lack of ownership of born digital records by administrative staff

  13. Research study findings • Challenge to identify data sets of archival value • TEDS & ERG funded dedicated data management roles including back-up & information security processes • However, majority of research groups do not have equivalent support, placing data at risk • Funding bids lacked formal data management plans to provide assurance or influence further funding • Continuing preservation of data not considered with focus on current work

  14. Comparison of research & admin data management • Individual researchers & administrative staff lack understanding of risk and use personal data approach • Understanding of digital environment is still outside their comfort zone - hybrid duplicated collections • High risk when staff – Principle Investigators or Administrators leave • No point of contact for advice or support

  15. Risk Assessment of research data management • Multiple risks identified • Active data management was good - recommendations made for best practice • Mitigation • Content versioning system • Store multiple versions of each data file • Implement integrity monitoring • Data management plan to document approach

  16. Risk Assessment of administrative data management • More risks identified than with research data • Lack of business owner for data sets • ISS provide storage & systems management but little data management expertise • ISS Data Management role now in place • Move to digital capture will address risks • Risk mitigation as for research records

  17. Actions taken by project • Institution-level Policies • Work with departments to address data management risks • Documentation • Implementation of KCL Digital Archive

  18. 1. Institution-level Policies • Update of existing policies: • Acquisition policy: Refinements to existing acquisition policies • Retention Policy: Appraisal criteria for records of value • Information Management: Appraisal criteria and advisory material • Develop new policies: • Preservation Policy: content preservation strategy for institutional data of short and long-term value • See http://www.kcl.ac.uk/iss/igc/tools/staff.html for guidance currently available

  19. 2. Liaise with data creators & managers • Enable management to gain a better understanding of data assets within their department/group and the potential risk factors that may limit data usage. • Work with data producers & systems managers to address data management issues that they identified as a concern, e.g. versioning • Make data producers & management aware of risk factors that exist and make recommendations for actions that may help to avoid or mitigate issues. • Make them aware of support available within College & other departments/groups/projects that are working to resolve common issues.

  20. 3. Documentation • Self-help documentation to help data creators & managers to: • Understand data management issues & key concepts • Practical steps to diagnose and address DM issues/people to contact • Data Management ‘workbook’: • Creating your data: Issues to consider prior and in early stages of development to ensure data is fit for purpose & usable over time. • Organising your data: Methods for structuring & documenting data to enable it to be used & understood • Maintaining access and use of data: Approaches that may be adopted to ensure continued access & use of data. • Appraising your data: Recommendations for applying archival principles • Content Type Reports: • Short pragmatic reports tailored to specific content types (raster images, audio, e-mail, documents) • To be published on KCL web site in near future

  21. 4. KCL Digital Archive • Implemented Alfresco ECM (Community Edition) to manage college data of long-term archival value • Standards compliance • OAIS RM, U.S. Department of Defense 5015.2-STD, ISO 15489, TRAC when in full service • Bitstream preservation: • fixity creation/verification, online + offline storage • Information Content Preservation: • Format conversion, event logging – audit trail • Access: • Limited to archive reading room, catalogue descriptive MD to common standard

  22. Rules-based approach to data management • jBPM synchronous or asynchronous workflows • Content model compliance • Conforms to defined structure & object types • Fixity generation • All: MD5, SHA-1, CRC • Format identification • All: File(1), DROID • Technical metadata extraction • Format specific: JHOVE, MP3Info, others • Conversion to preservation & dissemination derivative • parameters for each format & MD criteria (e.g. OpenOffice, ImageMagick) • Record action results as PREMIS Event • Close collection to prevent further update • Obsolescence monitoring? • Risk assessment based upon future development of PRONOM/UDFR • Manual activity for future date?

  23. Future Plans • Embedding approach into archives & wider institution • Identify research management needs at early stage (funding proposal, active/semi-active use) rather than end • Skills audit & needs assessment • Support & training for data management staff • College Storage strategy • Increased availability of College storage

  24. Lessons Learnt • Better understanding of data ‘ecosystem’ in college – data lifecycle, infrastructure • Progress made with identifying & addressing data management support – need to ‘scale-up’ to college as whole. • Need to manage semi-current record, in addition to active and archival records • Requirements for storage • Raised profile for Archives & CeRch • Need for cross-disciplinary approach to managing data – combination of expertise & shared language

  25. What may be used by other projects?

  26. Thank YouAny questions?

More Related