1 / 28

UKOLN is supported by:

Data Repositories and JISC Repository Landscape Mahendra Mahey Repositories Research Officer, Repositories Research Team, UKOLN GRADE Project Meeting (all partners), Edinburgh, 30 October 2006. UKOLN is supported by:.

donal
Download Presentation

UKOLN is supported by:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Repositories and JISC Repository Landscape Mahendra Mahey Repositories Research Officer, Repositories Research Team, UKOLN GRADE Project Meeting (all partners),Edinburgh, 30 October 2006. UKOLN is supported by: This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

  2. Data Repositories Landscape Disconnected landscape Institutions Data Centre Data Centre ? Institutions Data Centre Data Centre ?

  3. JISC Funds • Data Centres • MIMAS* • AHDS* • UK Data Archive* • EDINA * Also receive funding from Research Council UK

  4. JISC Information Environment Architecture (Idealised) Technical Infrastructure for ServicesAndy Powell, 2005

  5. Institutional Repositories Holding Research Data • Very few around the world are doing this and are they up to the job? • Versioning • Authentication at individual asset level • Other methods are being used, informal, ad-hoc, lots of data slipping through the net • Repositories offer a better way to do this? Different Data types lead to problems with existing software • Data cluster projects • E Bank • Spectra • GRADE • CLADDIER • ARROW – DART • The idea of linking papers to underlying data of experiments and research is very appealing – stORe project and Open Access! • Can do some (orphaned) but not all, still role for data centres

  6. Data Centres • Have been storing data for years and predate trendy ‘r’ word, experts • They can teach institutions many lessons • A lot of mystery, suspicion between Data Centres and Institutions communication and dialogue needed between the two and interdisciplinary • Time and money saving? • Data centres argue that that subject specific is a good thing, rationalising? • Storing and Curation has become science in its own right, bioinformatics • Offer • Databases • Web access • Tools to explore the information • Systems to capture the information • Service centres • Custodianship, acquisition and ownership • Depend of good will of community • Add value, service and organisation, require lots of money to continue

  7. Reactome Data Centre Infrastructure Can be Complex! EMBL-BankDNA sequences UniProt Protein Sequences EnsEMBL Genome Annotation Array-Express Microarray Expression Data EMSD Macromolecular Structure Data IntActProtein Interactions

  8. Institutional and Data Centre practice exist Data analysis, transformation, mining, modelling Presentation services / portals Data discovery, linking, citation Publishers: peer-review journals, conference proceedings, etc Aggregator services Publication Laboratory repository Deposit Validation Institutional data repositories Search, harvest Validation Deposit

  9. Data Centres Data Cluster DRP Projects Data Cluster Meetings • GRADE • R4L • SPECTRa • CLADDIER • stORe • eBank Road Map Required Briefing Paper Interviews and Surveys Workshop Road Map for Digital Repository / Preservation Projects Focusing on Data 06/09 Call

  10. UKOLN - Data Repositories Research (Consultancy) • To define how institutions (collectively and individually) and scientific data centres can together effectively achieve: • Preservation • Access – Managed and Open • Reuse – Data Citation, Data Mining and Reinterpretation • To identify the mechanisms, business processes and good practice by which these functions can be achieved • To facilitate dialogue between data centres, institutions and other key players and to define a collaborative way forward Dr Liz Lyon

  11. Identifying and defining inter-relationships • Socio-cultural, organisational, legal • Technical interoperability • Roles & responsibilities Access Preservation Re-use See briefing paper produced for workshop

  12. Socio-cultural, organisational, political and legal issues • highly diverse in awareness • practice and skills • need to understand the full spectrum of research practice • workflows and associated data flows • both within and between disciplines/sub-disciplines:

  13. Hierarchy of Drivers • Level 0: deliver project. • Level 1: meet ‘good scientific practice’. • Level 2: support own science. • Level 3: employer’s requirements. • Level 4: funder’s requirements. • Level 5: public policy requirements. Slide from Mark Thorley: NERC

  14. RC UK - Funding Body

  15. Socio-legal conclusions • Use a questionnaire and send to data centres, disciplines will be different • Promote use & interoperability through metadata standards. Resource discovery standards should be promoted & developed by learned societies/ (membership arms) subject communities by disciplines (not data curators). Bottom up rather than top down. Education – recognise very wide range of understanding amongst disciplines re value of data curation centres/IRs/archives – need go out and promote why they exist and why they should be used. Focus at community. • Each research council should have a written ‘meaty’ data policy, disseminated and policed. • Legal issues – value of JISC legal centre but lack clarity and guidance of law where law exists re use of digital objects, IP etc need clarity of law and guidance on how best to interpret it, straightforward answers to straightforward questions. Model licences for use, interpretation, confidentiality, disclosure. • Academics & data centres need to be told differences between data banks/data centres etc and IRs. IRs have not had enough institutional buy-in yet. • JISC could investigate why subject repositories are more successful than IRs. JISC policy should reflect what is happening on ground. • JISC should help sell IRs better

  16. Technical Interoperability • Federation models • interoperability and inter-relationships between repositories

  17. Open Access • Good thing but… • But are the tools up to the job • OAI PMH • Dublin Core • Use METS as packaging standard, momentum building? • Papers not data • For data do these map to other Metadata Schema developed, extensions to DC?

  18. Federation • Monolithic solutions fail • Aggregation of institutional repositories is essential Data Centre’s View

  19. Technical • Need to define what is meant by semantics of structured data and publish guidelines at levels of metadata, classification/subject areas/factual names/agreed conventions layered on top e.g identifiers. • Application profiles – who should be keeper of those definitions eg registries – who funds and owns them ? • Scientists concentrate on narrow areas but connections are to other wider areas • Time series data are different – how discover and use? More difficult to define discovery metadata for time series. Data might not be logically the same. • Data curation responsibility at institutional level/data centre – data curation requires specialisms and data centres could feed this expertise back to institutions – need flow of expertise from Data Centres to institutions • Invitations to work in a data centre for week – happening in Australia • Mixed economy re organisational responsibility is inevitable: some federation will be there • How to express quality – role for provenance and audit as a means to express quality; also ranking and annotation • Curation of data is of more interest to scientists than interoperability as a means of marketing/selling it.

  20. Roles, Rights & Responsibilities • ‘Scientist’: Creation and use of data. • ‘Data centre’: Curation of and access to data. • ‘User’: Use of 3rd party data. • ‘Funder’: Set / react to public policy drivers. • ‘Publisher’: Maintain integrity of the scientific record. From Mark Thorley: NERC

  21. Roles & Responsibilities • Individual scientists to deposit data using domain standards of an acceptable quality • Re-user should acknowledge where data came from and if it is appropriate to improve the quality of the data. • Institution should have policies that mandate data deposit in an appropriate place not necessarily an IR. • Publishers/journals/editors should mandate open deposit of data. • Curators who collect, describe and connect data, idea of community proxy role - define standards for domain working, in and with the scientists • Funders should enforce their data deposit policies where possible • Funders should recognise the emerging need for new infrastructure and provide appropriate funding for this infrastructure and for the resulting actions • Users and funders should feed back views on the data stored to the data centre manager • Click use licence – says if you enhance the data you must give it back, but how to police that policy by data centre? Versioning an issue here. • Value of “good enough” versus “completely comprehensive” descriptions (Graham C) • Who is responsible for ownership of the data to make changes? If multiple versions, not necessarily the last one is best • Competitive views: risk of sabotage of other groups work is possible. • Who checks provenance of anything new? Curators?

  22. Small Science vs Big Science “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.” ‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)

  23. Dataset publishing • Re examine concept of Dataset Publishing (Callahan, Johnson, and Shelley 1996) • analogous to publishing papers • rewards for publishing datasets (e.g. promotion, RAE) • procedures (e.g. standards to use, peer review) & resources to manage procedures • Should minimise time and effort required • need tools to assist in creation, maintenance and dissemination of dataset descriptions • Means of ‘putting’ into a public/community • Deposit and Share are too cosy • to ‘publicate, to issue • Terms of access and use • Open? • Privilege of membership • Payment of money Taken from Peter Burnhill

  24. Spatial is Special • Why? • GEO research data not deposited, Lots of data slipping through nets, not falling under RC remit, Data being lost, shared informally, may be case for national repository? • Fears about legality of resources, e.g. OS data, researchers really want to share in a big way • Should data be deposited in Data Centres? • Academics not comfortable about sharing on larger scale? • IRs not geared up to handle data? • DSPace not allow edit of Metadata • Problem with ISO Standard used for Geo data ISO 19115 and DC • Mapping done, further work needed, from wing mirror to Smart Car?

  25. Responsibility of Data Providers • Responsibility of publically funded research to share data • ‘Free our Data’ Guardian work • INSPIRE work

  26. GRADE’s input • Important that GRADE inputs into this work as it will set direction of research and focus on GEOSPATIAL DATA Repository work • Interviews held with Rebecca and David

  27. Data Centres Data Cluster DRP Projects Data Cluster Meetings • GRADE • R4L • SPECTRa • CLADDIER • stORe • eBank Road Map Required Briefing Paper Interviews and Surveys Workshop Road Map for Digital Repository / Preservation Projects Focusing on Data 06/09 Call

  28. We need your input! l.lyon@ukoln.ac.uk m.mahey@ukoln.ac.uk

More Related