290 likes | 459 Views
……………………………………………………………………………………………………. A DATACITE CASE STUDY FROM THE UK DATA ARCHIVE. TOM ENSOM …………………….…………………………….… UK DATA SERVICE UK DATA ARCHIVE UNIVERSITY OF ESSEX ………………………………..……………………. C4D WORKSHOP , JULY 2013, LONDON. WHO WE ARE.
E N D
………………………………………………………………………………………………………………………………………………………………………………………………………… A DATACITE CASE STUDY FROM THE UK DATA ARCHIVE TOM ENSOM …………………….…………………………….… UK DATA SERVICE UK DATA ARCHIVE UNIVERSITY OF ESSEX ………………………………..……………………. C4D WORKSHOP, JULY 2013, LONDON
WHO WE ARE • Established in 1968 -46 years of selecting, curating, preserving and providing access to social science data • 6,000 datasets in the collection • Over 25,000 registered users • Data and data support services for higher and further education for research, teaching and learning • Have been registered to ISO 27001 (information security standard) since June 2010
OUR SERVICES • UK Data Archive itself a department of the University of Essex • Distributed service established 1 January 2003 called the Economic and Social Data Service (ESDS) • New five-year UK Data Servicefrom 2012
WHAT WE DO • Research & development, innovation • Promoting best practice in data curation • Raise standards in data security and awareness of ethical/legal issues • Raise standards in data management • Data management hub • We provide guidance to ESRC researchers and anyone else who asks
WE SUPPORT RESEARCHERS • Popular training materials • Managing and Sharing Guide • Training Resources • Website: http://data-archive.ac.uk/create-manage • Bespoke training events • Large and small scale workshops
ENGAGEMENT WITH RDM COMMUNITY • Recently completed JISC Managing Research Data project with University of Essex • Cross support service, departmental engagement • Piloted an RDM infrastructure • http://www.data-archive.ac.uk/create-manage/projects/rd-essex • Outputs of value to RDM community: • Metadata profile for institutional data repositories • Research data plugin for EPrints
WHYCITE DATA? It’s a vital part of a rigorous research process: • Acknowledges researcher’ssources • Gives data creators, authors and data curators proper credit when their work is reused • Facilitates data resource discovery and access • Helps track the use and impactof data collections
OUR APPROACH TO CITATION • Required by our user agreement (End User Licence) for many years:
OUR APPROACH TO CITATION • Should include enough information to ensure the exact version can be located “University of Essex. Institute for Social and Economic Research and National Centre for Social Research, Understanding Society: Wave 1, 2009-2010 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], November 2011. SN: 6614.” • No widely agreed standard citation format yet! • Version information crucial
PERSISTENT IDENTIFERS • Persistent Identifiers (PIDs) • A string identifying a clearly defined digital object • Persistence must mean enduring • Identifiers must be unique • PIDs have been attached to scientific publications for some time • Next logical step: data • Also being applied to other entities e.g. people via ORCID system
CHANGES TO DATA • Our ‘data collections’ are not discrete digital objects • Approx. 15% UKDA data collections are altered within first year after publication • Versioning - we need to distinguish between major and minor changes to a data collection • Integrate processes with: • Digital preservation activities • Current ingest infrastructure / workflows
MINOR CHANGES – LOW IMPACT • Publication reference added • Correction of spelling in variable labels • Small changes in variable labels • Removal of (erroneously supplied) admin variables • Correction of spelling in metadata • Minor changes in documentation • New index (keyword) terms • Additional documentation added (non-fundamental) • Change in access conditions
MAJOR CHANGES – HIGH IMPACT • Adding new ‘waves’ in a data series • New variable added • New labels/value codes added • Weighting variables reconstructed • Wrong data supplied (e.g., March not April) • Mis-coded data (e.g., Don’t know/Refused mix-up) • Change in format (file migration) • Significant changes in documentation • Change in access conditions
DATACITE DOIs • 2011: we started working with the British Library and DataCite to develop a permanent, reliable method of citing our data collections • DataCite • Founded by organisations from six countries • Established a citation format for research data, including a DOI • Works with data publishers, e.g. established data centres and institutional repositories
WHY DATACITE? Not the only choice, but right for us: • DOI framework an international and persistent standard for identifying digital objects • Familiar within the research data domain • Centralised resolution service • Metadata registry (and thus de facto standard) • Discovery link up • API – allowing for automation of minting process (but also manual if you prefer!)
DOI FORMAT 10.5255 / UKDA – SN – 1 – 1 • Resource • version • Unique archive identifier • Resource • identifier type • Readable archive identifier • Resource • identifier
DOI VERSIONING Low impact change 10.5255/UKDA-SN-1-1 High impact change 10.5255/UKDA-SN-1-1 10.5255/UKDA-SN-1-2 Increments minor version - internal Increments majorversion – new DOI …………………….………………………………………………………
CREATING A NEW DOI • Minimal DataCite metadata inc. requested DOI pushed to DataCite metadata store via API • DataCite API sends back an approval • Flagged behind the scenes • New data collection ‘ingested’ • Structured DOI ‘created’ • New change log • New citation file
UPDATING A DOI – HIGH IMPACT • Minimal DataCite metadata inc. requested DOI pushed to DataCite metadata store via API • DataCite API sends back an approval • Flagged behind the scenes • High impact change to data collection • Incremental DOI version ‘created’ • Update change log • New citation file
UPDATING A DOI – LOW IMPACT • Minimal DataCite metadata pushed to DataCite metadata store via API • DataCite API sends back an approval • Flagged behind the scenes • Low impact change to data collection • Update change log
THE END RESULT… DOI: SN-####-3 SN#### Survey Waves 1-15 Instance-specific data and metadata (current) DOI: SN-####-2 SN#### Survey Waves 1-14 Instance-specific data and metadata DOI: SN-####-1 SN#### Survey Waves 1-13 Instance-specific data and metadata Jump page (= change log)
CHALLENGES FOR THE FUTURE • Citing parts (fragments)of data collections • single files • subsets of quantitative data files • extracts of textual data • Still uncertainty over where exactly research data should go – IR, Subject Specific Repository, Data Journal? • Who should be minting DOIs? • Avoid assigning multiple identifiers to an object
ACKNOWLEDGEMENTS Thanks to the following UKDA/UKDS staff for their assistance in putting this together: • Matthew Woollard • Louise Corti • John Payne • Matthew Brumpton • Sharon Bolton
CONTACT TOM ENSOM UK DATA ARCHIVE UNIVERSITY OF ESSEX WIVENHOE PARK COLCHESTER ESSEX CO4 3SQ ……………..…..……………………….. T+44 (0)1206 872974 Etensom@essex.ac.uk www.data-archive.ac.uk