170 likes | 272 Views
DAF methodology & Glasgow Uni scoping study. Sarah Jones DCC, University of Glasgow s.jones@hatii.arts.gla.ac.uk. Background to DAF project.
E N D
DAF methodology & Glasgow Uni scoping study Sarah Jones DCC, University of Glasgow s.jones@hatii.arts.gla.ac.uk
Background to DAF project “JISC should develop a Data Audit Framework to enable all universities and colleges to carry out an audit of departmental data collections, awareness, policies and practice for data curation and preservation” Liz Lyon, Dealing with Data: Roles, Rights, Responsibilities and Relationships, (2007)
The methodology http://www.data-audit.eu/DAF_Methodology.pdf
Stage 1: planning Objective Determine what you want to find out and prepare work in advance Process - Define scope / expected outcomes - Research organisational context - Set up survey, interviews, meetings…
Stage 2: identifying data Objective Create inventory to understand scale of data Process Engage researchers to: - Identify key data assets - Classify data to restrict scope
Stage 3: assessing data management Objective Identify weaknesses in data management and potential risks Process - In-depth assessment of most crucial assets, given purpose of audit - Discussion on lifecycle of data to assess data management
Stage 4: recommendations Objective Recommend changes to improve data management Process - Collate audit results - Analyse data - Suggest changes to mitigate weaknesses
DAF pilot implementations • Early test cases: GeoSciences; Archaeology; Mechanical Engineering; Humanities • University of Edinburgh Physiology; Divinity; History; Brain Imaging; Astronomy • University College London Archaeology; Scandinavian Studies; Physics & Astronomy; Life & Medical Sciences • Imperial College London Chemical Engineering; Physics; Business School • King’s College London Geography; Psychiatry; Environmental Research; Biomedical And Health Sciences • DataShare examples Cardiac group; Dept of International Development; Social Sciences
Workshop on next steps for DAF • Many of the pilots found the actual process of gathering information on data management was more valuable than the asset register. The DAF approach was felt to be useful for defining requirements to improve data management. (JISC funded RDMI projects) • A suggestion was made to enhance DAF with practical examples / guidance from the pilot studies. (Implementation Guide) • Align the DAF process with other data management planning tools. (IDMP project between AIDA, DAF, DRAMBORA, LIFE)
GU scoping studies • Digital preservation Advisory Board established at GU in 2008 • Keen to identify scale of digital preservation needs across the uni • Scoping studies ran in 2009 in: • Archaeology • Chemistry • Corporate Communications • Court Office • English Language • Electronics and Electrical Engineering • Evolutionary Ecology and Biology • MRC Social and Public Health Sciences Unit
Methodology • Semi-structured interviews • interview framework sent in advance • some background research done before interview e.g. reading staff profile • recorded (with permission) then transcribed and sent for comments • Spoke with HoDs, researchers, teaching, admin and support staff • Reviewed preliminary findings and increased scope • added more PhDs and ECRs as most researchers we’d spoken to were senior • added corporate communications for ‘web’ perspective • Spoke to additional key people at the Uni e.g. William Nixon, repository manager; James Currall, security expert.
Interview framework • what digital material is being created • how this is being created and maintained • any issues that have been encountered • plans for the long-term e.g. preservation, reuse • requirements for support and services. http://www.gla.ac.uk/media/media_126658_en.pdf
What did we find? Pockets of good practice… …. but a lot of confusion and need for support It makes a huge difference if somebody can come and talk through problems and solutions with you. A personal contact like the RDOs is helpful. • to connect data with documentation, we name files using a code number which is the person’s initials, the lab book number in roman numerals and then the experiment number • We produced documentation workflows on how to take material from the DAT machines, how to transfer these into computer files, guidelines on transcription and anonymisation, and making derivates. It’s all very well documented which means there is consistency across the team, which is vitally important.
Procedures for creation & management • the network has always been the bane of everyone’s lives to find stuff on - you end up opening umpteen files to see if it’s the one you’re after They had major problems last year moving from ArcGIS 9.1 to 9.3 – everything stopped working as they’d changed the geo-database format. It was not straightforward to fix… The volume of data produced makes maintenance a bit like drinking from a fire hose. • the licence is very expensive and if this weren’t renewed it wouldn’t be possible to continue to access the data the paper records system hasn’t transferred easily to the digital Digital images are a classic case in point as many still have the numerical ordering and cryptic letter sequence auto-generated by the camera.
Storage and backup Research groups tend to run their own little fiefdom. The correlation seems to be the more computers they have, the less IT expertise there is. • Insufficient backup space is a recurring problem, but it’s not really a lack of space, it’s more an issue of not being able to control what people store on their hard drives. • People bring in sticks with 4GB of data on that simply no longer work and nothing can be done to retrieve it. large and reliable storage is expensive. You need this for home directories but things that are to be archived or backed up could be punted out of the way to cheaper storage. If they throw some money at the problem they can install another networked drive and the problem goes away for a while
Selection / long-term preservation • It’s one thing to keep something going, but are people still able to use it in the same way? • If the website comes to an end, the data could still be preserved, but you lose the richness of being able to search that, or see it on a map, or have them synchronised. • it’s like giving your baby away • How do you decide what can be deleted? I’m not confident to make that decision. Probably only one tenth of what’s currently held should be retained. • Archiving is to allow someone else to reuse it If I know the code will be public I’ll pay more attention to properly annotating it with comments so other people can understand it.
What next… • DPAB continues to address this at senior management level • JISC-funded Incremental project (part of MRD programme) • Ensuring researchers can find guidance and support when needed • Making data training and guidance more understandable to researchers • Offering tailored support and partnering http://www.lib.cam.ac.uk/preservation/incremental/index.html