1 / 30

Attitudes and aspirations in a diverse world

This presentation explores the attitudes and aspirations in scientific repositories, discussing curation and preservation issues, research data, metadata, ownership and support, and the challenges of handling large datasets.

genevievew
Download Presentation

Attitudes and aspirations in a diverse world

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Attitudes and aspirations in a diverse world The Project StORe perspective on scientific repositories Graham Pryor – 22nd November 2006 Digital Data Curation in Practice - 2nd International Digital Curation Conference, Glasgow 21-22 November 2006

  2. StORe Guide What’s in StORe? • Curation and preservation issues • Attitudes and aspirations • Research data • Repositories • Metadata • Ownership and support • Too huge to handle? 2nd International Digital Curation Conference, Glasgow

  3. 2nd International Digital Curation Conference, Glasgow

  4. Digital Data Curation Definition: the actions needed to maintain digital data and other digital materials over their entire life-cycle and indefinitely for current and future generations of users. These actions include not only the processes of digital archiving and preservation but also all of the processes that are essential to good data creation and management, as well as the capacity to add value to data to generate new sources of information and knowledge. 2nd International Digital Curation Conference, Glasgow

  5. What’s in StORe? – Aims 1 Attach new value to the intellectual products of academic research by providing two-way links between source and output repositories 2nd International Digital Curation Conference, Glasgow

  6. What’s in StORe – Aims 2 • Surveys to identify workflows and norms, problems and desirable enhancements to source/output repositories • A generic technical specification for functional enhancements to source and output repositories • Pilot middleware that demonstrates a bi-directional link • Independent evaluations of the pilot middleware and recommendations for future development as a generic platform for linking repositories 2nd International Digital Curation Conference, Glasgow

  7. Dual deposit of data and publications already an accepted concept International strategies for data deposit and data preservation Genuine desire to contribute to the wealth of knowledge Awareness of the critical need to assign and maintain appropriate metadata What’s in StORe? – Survey 2nd International Digital Curation Conference, Glasgow

  8. Cultural and organisational barriers to deposit of research data in repositories Inherent culture of self-sufficiency in the generation and organisation of data Limited inclination towards voluntary deposit in open access source repositories Institutional output repositories not on the agenda of most researchers What’s in StORe? – Survey 2nd International Digital Curation Conference, Glasgow

  9. Features of source data Often large and complex Can be impenetrable without local tools May seem ambiguous to project outsiders Are frequently held on standalone equipment Commonly comprise several data formats From the StORe survey Physics: raw data sets as large as petabytes (1015bytes) may be generated or analysed using software written within the project Biosciences: need to describe how data were produced, the laboratory conditions and methodology 70% of bioscience source data are not networked Chemists: data stored in numerous sub-folders (spectra, images, etc.) describing one process Research Data 2nd International Digital Curation Conference, Glasgow

  10. Chemistry data sets: links between complex clusters “..it would have to be everything associated with that compound. There is no point having an NMR without a picture of what it is. Then it’s useful to have a synthesis scenario and say oh that could fit with that but I want proof and then that really is a paper. You know you can waste a lot of time trying to follow what people have done before that isn’t properly published and never have worth. It’s not always, but is it worth the risk of wasting too much of your time?” Research Data 2nd International Digital Curation Conference, Glasgow

  11. Telemetry Video Topographical data Remote sensing Geophysical data Synthetic data Other Raw data Photographs Statistical data Drawings, Plots Images Databases Text-based files Spectra Derived data Instrument data Research Data Physics data types 2nd International Digital Curation Conference, Glasgow

  12. 30 25 20 15 10 5 0 Other: CAD/GIS: Plain text (.txt): Rich text files (.rtf): Statistical software: Tables/catalogues: Spreadsheets (Excel/.xls): Image files (.jpg, .tif, .bmp, .gif): Portable document format (.pdf): Database files (Access, MySQL): Word processed files (Word/.doc): Extensible mark-up language (XML): Hypertext mark-up language (HTML): Research Data Archaeology file types 2nd International Digital Curation Conference, Glasgow

  13. Repositories • Source repository development is discipline-led • Large number of established services - we suggested: Archaeology Data Service, Brookhaven National Laboratories, CERN, GenBank, National Crystallography Service, NERC Data Centres, Protein Structures Database, SuperCOSMOS, UK Data Archive, UniProt - to which were added 99 others • Some international strategies/ - Astronomy (Virtual Observatory) 2nd International Digital Curation Conference, Glasgow

  14. Repositories • Source repository development is discipline-led • Large number of established services - we suggested: Archaeology Data Service, Brookhaven National Laboratories, CERN, GenBank, National Crystallography Service, NERC Data Centres, Protein Structures Database, SuperCOSMOS, UK Data Archive, UniProt - to which were added 99 others • Some international strategies/mandates/ - Astronomy (Virtual Observatory) - Biosciences (sequence data) 2nd International Digital Curation Conference, Glasgow

  15. Repositories • Source repository development is discipline-led • Large number of established services - we suggested: Archaeology Data Service, Brookhaven National Laboratories, CERN, GenBank, National Crystallography Service, NERC Data Centres, Protein Structures Database, SuperCOSMOS, UK Data Archive, UniProt - to which were added 99 others • Some international strategies/mandates/dual deposit - Astronomy (Virtual Observatory) - Biosciences (sequence data) - Chemistry (Crystallographic Data Centre) 2nd International Digital Curation Conference, Glasgow

  16. Low awareness of repositories 65% of the chemists surveyed had not used a repository and were not familiar with the idea of open access repositories Repositories 2nd International Digital Curation Conference, Glasgow

  17. Low awareness of repositories Low volume of repository use 65% of the chemists surveyed had not used a repository and were not familiar with the idea of open access repositories Many social scientists did not associate repositories with their research agenda Repositories 2nd International Digital Curation Conference, Glasgow

  18. Low awareness of repositories Low volume of repository use 65% of the chemists surveyed had not used a repository and were not familiar with the idea of open access repositories Many social scientists did not associate repositories with their research agenda Repositories are only one of many potential data sources/archives used by researchers Repositories 2nd International Digital Curation Conference, Glasgow

  19. Repositories • Low awareness of repositories • Low volume of repository use • Low rate of source data deposit 2nd International Digital Curation Conference, Glasgow

  20. Repositories • Low awareness of repositories • Low volume of repository use • Low rate of source data deposit • Output repositories • prefer publisher over institutional • prefer Google type searching 2nd International Digital Curation Conference, Glasgow

  21. Metadata • All disciplines: an awareness of the importance of appropriate metadata • Improvements to source repositories? Better metadata ranked highest • Metadata assignment considered challenging: intellectually and in the demands on one’s time Yet… • Evidence of lack of standard structures • Metadata assignment often almost an afterthought • One third of StORe respondents believed no metadata were being assigned 2nd International Digital Curation Conference, Glasgow

  22. 35 Archaeology 30 Astronomy 25 Biosciences 20 15 10 5 0 Not known Library staff Individual researchers Research support staff No formal metadata used Research team (collective) Repository admin./automatic Metadata assignment 2nd International Digital Curation Conference, Glasgow

  23. Metadata • Where researchers are familiar with metadata they possess an in-depth knowledge of its use, applications and functions • The assignment of metadata automatically (or by a process that relieves the depositor of doing it) is preferred • Quote from theoretical chemistry interview: “Well, there’s lots of different types of metadata. There is metadata for discovery, there is metadata for semantics, there is metadata for intellectual property and so on and so forth. They are all important. If I find some piece of information and it’s not on open access then I can’t use it. If I find some piece of metadata and it’s in a language that my machine does not understand and there is no metadata, then it is uninterpretable, I cannot use it. If I am particularly concerned about the quality of data I need provenance metadata. So there are different needs for different people...” 2nd International Digital Curation Conference, Glasgow

  24. Metadata • Need for improved and universal standards acknowledged • A clear link identified between the condition of metadata used and the level of support from information specialists • Recognition of the need for different metadata for different phases of research lifecycle (raw, processed, published data and beyond) and to assist cross-discipline interpretation 2nd International Digital Curation Conference, Glasgow

  25. Ownership & Support • Working culture: self-reliance and a constant pressure to deliver • Qualified enthusiasm for deposit in source repositories: producer or consumer • Anxiety over predatory access and IPR • Storage methods: protectionism? • Provision of specialist support less a case of unavailability as not sought 2nd International Digital Curation Conference, Glasgow

  26. Too Huge to Handle? • One of the aspects that the [Chemistry] interviewees commented upon was that there should be a wider organisational/institutional requirement that supports and manages the repositories, should they be source, output or institutional. • “…sustainability depends on a business model. And it’s a major problem that confronts everybody at the moment in aggregating data, whether it be raw data, processed data, metadata, primary publications, abstracts, things like that” 2nd International Digital Curation Conference, Glasgow

  27. Too Huge to Handle? • Embedding of data management expertise within domains • Expensive? • Interventionist? • Too large and too difficult? 2nd International Digital Curation Conference, Glasgow

  28. Too Huge to Handle? • Embedding of data management expertise within domains • Expensive? • Interventionist? • Too large and too difficult? • “Poor investment decisions can have major implications on how much information can be preserved, and how effectively” – Chris Rusbridge, http://www.ariadne.ac.uk/issue46/ 2nd International Digital Curation Conference, Glasgow

  29. Too Huge to Handle? • MRC £1 million data sharing and preservation initiative - http://www.mrc.ac.uk/strategy-data_sharing.htm - initial focus on 4 to 6 unique datasets of long term value - engage community support; longer term business plan • Virtual Observatory “Exploit information management and curation experience in the university libraries and build on long-term institutional commitments to preservation” – Bob Hanisch http://www.arl.org/sparc/meetings/ala06/HanischPPT.pdf 2nd International Digital Curation Conference, Glasgow

  30. END http://jiscstore.jot.com/SurveyPhase 2nd International Digital Curation Conference, Glasgow

More Related