300 likes | 413 Views
Which One’s Which? Understanding Versioning in Repositories 22 nd April 2008. This afternoon’s programme. 13.30 – VIF and Versioning 14.30 – Refreshment break 14.50 – Breakouts 1. Metadata 2. Strategy and Advocacy 15.50 – Refreshment break 16.10 – Software and Versioning
E N D
Which One’s Which? Understanding Versioning in Repositories 22nd April 2008
This afternoon’s programme • 13.30 – VIF and Versioning • 14.30 – Refreshment break • 14.50 – Breakouts • 1. Metadata • 2. Strategy and Advocacy • 15.50 – Refreshment break • 16.10 – Software and Versioning • 17.30 – Reception
The VIF project Funded by JISC’s Repositories and Preservation Programme from July 2007 to May 2008 Ran in 3 stages: a user requirements exercise development of a framework with input from an Expert Group and comments from a Review Group a dissemination phase to promote the recommendations and guidance and raise awareness of the issue of versioning The framework is web-based; it highlights the issues associated with versioning and gives guidance for people involved in: repository management software development creation of content
This session covers an overview of the research and the problem The Framework: www.lse.ac.uk/library/vif
VIF User Requirements Exercise • Surveys: • Drew on experience of VERSIONS Project • Created with BOS software • Split into 2 surveys Information Professionals and Academics • Interviews: • A number of informal background gathering interviews • A few structured formal interviews with specific audiences such as an archivist and a records manager • Follow up dataset questions: • Very small questionnaire sent to targeted individuals • Identified from the surveys and from DataShare project
The Surveys • Timing: • First draft completed by late July • Piloted in August • Survey ran from 30th August to 15th October 2007 • Follow-on survey ended mid November • Incentives used – Amazon vouchers & IPod Nano • Promotion via: • JISC lists • Personal e-mails/telephone calls • Internal newsletters • Approx. 150 responses (plus approx. 60 incomplete not used in analysis)
Respondent Profile • Academics Survey: • 50 responses • Mainly Lecturers/Professors • Mainly UK, some US, Australian and European • Professionals Survey: • 100 responses • Mainly Library Repository people • Mainly UK
Survey results 1 – current attitudes • Only 5% of academics and 6.5% of Information Professionals surveyed found it easy to identify versions of digital objects within institutional repositories. The situation becomes even worse across multiple repositories (1.8% and 1.1% respectively) • Academics are broadly happy (66%) with how they identify versions on their own computer etc. • There is strong feeling amongst Academics that repositories should only include the ‘finished’ version of a work. Free text boxes were often used to make this point even when this was not the question being asked
Survey results 2 – Current situation • Although text documents are the most popular type of material created by academics and stored within repositories both Information Professionals and Academics anticipate a substantial rise in the use of different types of digital objects i.e. audio and video files etc. • Approximately a third of Information Professionals involved with repositories stated that they either have no system currently in place or ‘don’t know’ how they deal with versioning at present • Information Professionals have little influence prior to ingest/deposit
Survey results 3 – Proposed versioning solutions • No one silver-bullet solution • Many of the potential solutions to the issue of versioning covered by the survey received strong support from both groups of respondents but numerous problems were captured by free text responses • The only solution with any claim to broad support was the use of date stamps – but again with many warnings
Survey recommendations • Versioning is relevant to all types of digital objects – any framework should therefore be deliberately broad • Premium placed on ‘final’ versions of digital objects - 91.6% of total respondents thought that being able to clearly identify the ‘finished’ version of an object was ‘essential’ or ‘important’ • Academics appear largely disengaged from the problem of versioning (and are largely happy with the way they deal with their own work) – any advice given should be simple and flexible and avoid top-down enforcement • Cross-repository versioning is a problem and should be dealt with in the framework
What is a version anyway? Are a pre-publication text document and the published journal article versions of each other? Are a digitised 18th century map of Hertfordshire and a present day map of the same place versions of each other? Are audio recordings of the same piece of music played by different orchestras at different times and in different places versions of each other? Are a video of a conference session, a photo taken there, the presentation given and the original article that led to the session in any wayversions of each other?
Question 1 – just iterations or outputs as well? • There are different levels of versioning iterations: • minor changes (a revision) • significant changes (a landmark version, e.g. peer reviewed, published etc) • formatting or stylistic changes (e.g. typesetting or font) • change of file format (creating a digital variant) • But one research project can generate many outputs describing the same idea or work • It is possible to call both outputs and iterations ‘versions’
Question 2 – version object vs. version relationship What links the objects in the examples together? Author? Idea? Time? We will go into a good potential model later – FRBR Understanding requires recognition that there is a difference between a single ‘version’ and a ‘version relationship’: Example 1: A researcher Version - I want to cite the latest version, is this one it? Version relationship - I’m writing about the development of an academic’s work over the past 10 years. His outputs are numerous and includes diagrams, conference presentations and articles. When and in what order were they produced? Example 2: A repository manager Version - Does this wind speed dataset contain the latest collection of data? Version relationship - There are 2 datasets measuring ‘wind speed’ taken from exactly the same place. They were recorded by different people for different purposes. Should they be linked? If so, how?
VIF’s assumptions: • A version is identifiable; the change between versions is describable and understood by either human or machine • The understanding of what a version is relates to; • either its content (i.e. a digital variant) or its format (i.e. a digital copy) • either an iteration or output • both the object itself and its relationship to other objects • Some versions can be perceived to be more relevant/appropriate, authorised and/or authentic than others by either author or reader. But, only the end user might determine which version is most relevant for them and why • Clarity about versions should help an end user understand which is the ‘best version’ for their purposes
Developing a definition and the framework: The framework has been developed: recognising a fairly broad audience of interests groups and levels of knowledge to provide user driven advice to support repositories to provide best practice which will be spread via a ground-up approach recognising that certain things like ‘final version’ are critical to identify, but maintaining an agnostic stance about relative importance of objects – ‘fit for purpose’ rather than ‘best or most relevant’ We therefore needed: a deliberately wide understanding of what constituted a version for all involved to encompass anything that anyone might consider to be a version to find ways to make versioning information transparent to the end user
VIF’s definition: • A 'version' is a digital object (in whatever format) that exists in time and place and has a context that can be described by the relationship it has to other objects • A ‘version relationship’ is an understanding or expression of how two or more objects relate to each other
The essential information for all within the framework Now for the framework itself:
Making information about version transparent • VIF has identified the pieces of information that give clues about version status • We then worked out how to make this information available to the people who use repositories
Essential Versioning Information • There are five pieces of information that when some or all are present will allow someone to understand what version they have:
Embedding the information • The repository metadata itself is frequently bypassed, therefore if the object does not have the information contained within it, it can become impossible to ascertain version status for an end user • Metadata is not evident when: • Access to the object is through a direct link • Access occurs via a search engine like Google • Cross repository search services are used. They often deal with inconsistent metadata by harvesting as much information as possible and then re-producing it in their standard format
Object Solutions • Filename – should be unambiguous, preferably uniform. Could include repository no., for example. • Coversheet - http://eprints.lse.ac.uk/2631/ • ID tags / Properties – dates, version no etc can be store in these easily at the creation stage or by repositories. • Watermark: http://arxiv.org/PS_cache/astro-ph/pdf/0701/0701001v2.pdf
Overview of framework recommendations Repository Management: • Formulate wider strategy; set and promote clear policies • Use object solutions and get version information at ingest • Include version information in metadata Software Development: • Make systems cope with and link more than one version • Look at a FRBRised structure to establish version relationships • Support richer metadata using DC application profiles Recommendations for Content Creators: • State the author, title and date last changed • Keep track of which versions are available and where • See VERSIONS Toolkit for more information
The Framework • Go look at the framework - full details, explanations, pros and cons, guidance and recommendations contained within • All on the web, will be available as PDF in May • It’s not finished and can still be improved! • www.lse.ac.uk/library/vif
Survey recommendations - revisited • Versioning is relevant to all types of digital objects – any framework should therefore be deliberately broad • Premium placed on ‘final’ versions of digital objects - 91.6% of total respondents thought that being able to clearly identify the ‘finished’ version of an object was ‘essential’ or ‘important’ • Academics appear largely disengaged from the problem of versioning (and are largely happy with the way they deal with their own work) – any advice given should be simple and flexible and avoid top-down enforcement • Cross-repository versioning is a problem and should be dealt with in the framework
Project Director: Frances Shipsey,LSE Library,f.m.shipsey@lse.ac.uk Project Manager:Jenny Brace, LSE Library, j.e.brace@lse.ac.uk Project and Communications Officer:Dave Puplett, LSE Library, d.puplett@lse.ac.uk Project Officer:Paul Cave, University of Leeds, p.l.cave@leeds.ac.uk Project Officer:Catherine Jones, Science and Technology Facilities Council, c.m.jones@rl.ac.uk