200 likes | 399 Views
Digital Curation Centre. a centre of expertise in data curation and preservation. Digital Curation Centre: tools and services under development. David Giaretta Associate Director (Development). Funders:. Organisation. curation organisations eg DPC. communities of practice: users. UKOLN.
E N D
Digital Curation Centre a centre of expertise in data curation and preservation Digital Curation Centre: tools and services under development David Giaretta Associate Director (Development) Funders:
Organisation curation organisations eg DPC communities of practice: users UKOLN Collaborative Associates Network of Data Organisations U of Edinburgh U of Glasgow U of Edinburgh research collaborators CCLRC testbeds& tools Industry standards bodies
Organisation curation organisations eg DPC communities of practice: users community support & outreach Collaborative Associates Network of Data Organisations service definition & delivery management & admin support research collaborators research development co-ordination testbeds& tools Industry standards bodies
CMS-Bristol NASA NARA CNES ESA RLG BNSC BODC BADC NIEeS Cambridge Leicester Jodrell Bank DPC ESO RG RLG IVOA ESA SDSC Kyoto USC CDS ESO Council for Museums, Archives & Libraries Caltech JHU CSIRO RDN. OCLC International Collaborations Research Institutes RI EDG GridPP EGEE UNC So’ton MIMAS NLA CEH OAI NOF NCS ILRT HEIs & FE NEODC WT-CFG Leicester IC Maastricht Oxford AHDS Microsoft IBM Oracle BT STK Standards Bodies Durham Innogen Dutch NA Swiss NA Urbino Research Councils Data Archive Capri NTUA INRIA HUJ UPC Max- Planck LDC Salzburg NHS ACM Roslin INRIA MIMAS UNC JHU CSIRO IBM Almaden MRC HGU EBI OCLC TU Vienna IASSIST UPenn GSK CCLRC UKOLN DELOS DPC DLI (US) NeSC UofE UofG
Overview • Developing tools and services which will be needed in the short-medium term • integrating tools from many sources • Will be new DCC services as well as useable separately by other projects • Strongly OAIS based • Support automated processing & interoperability
Representation Information vs Format • Format = Structure • Omits important information e.g • Language, terminology • Encryption • Need to know more than just Format in order to stand a chance of being in a position to use the information
Layered Model from OAIS More easily applicable to Science data
Representation Information - High Level View Example of use of Representation Information Labelling
Registry/Repository • Interface and protocols – JAXR “standard” • freebXML implementation • many access methods • URL • Web Services • API • Etc.. • Findability • Persistent IDs • What can we rely on? • Labels (to support automated processing) • Initial service this Summer • Hope to work with PRONOM 4 & GDFR
Registry/ Repository • Trusted repository of Rep. Info • Authenticity of info • Access control • Certificates/Digests : (are they trustable over the long term?) • Extensibility • Distributed
Certification • RLG task force preparing draft standard • Based on OAIS (plus TDR) • Expect this to become an ISO standard • Tool: • Checklist and reports • … • Awaiting release of draft (in May)
Archival Information Package • METS • XFDU Packaging • Expect tools available by end of year
Preservation Description Info Will be working with PREMIS on tools
DCC Development Roadmap for next 6-12 months • Registry • Complete phase 1 • Include links to TNA/PRONOM • Hand over to Services group • Start Phase 2 – aim for “Trusted Repository” status • Representation Information: • Data descriptions of science data using EAST (http://east.cnes.fr) & others • Import other Structure description tools and Data Dictionary tools • Develop Mapping to data object level • Work with other projects e.g. Emulation, Processing • Certification • Draft certification • Checklist • Proposed standard • Additional Tools • Metadata extraction tool set • Ingest tool (based on PAIMAS standard) • Testbeds e.g. large scale data management tools
Research • To draw together the various functions of curation, from the traditional archival functions to the maintenance and publication of evolving knowledge as seen in scientific databases. • To identify through direct research collaboration, and through interaction with the service arm of DCC, the key projects in which research is needed. • To conduct research in areas already identified by the partners as crucial to digital curation. • To institute two-way conduits between research and service in which practical issues can be drawn to the attention of researchers and the products of research can be tested in practice.
Current research priorities • Data integration and publication • Performance and optimisation • Annotation • Appraisal and long-term preservation • Socio-economic and legal context: rights, responsibilities and viability • Cost-benefit analysis of the data curation process • Security: safe and effective data analysis environments • Automation of metadata extraction • Visitors Programme and Seminar Series
Summary • Developing and integrating OAIS based tools • Reviewing other related tools • See http://www.dcc.ac.uk • also Development Web site (http://dev.dcc.rl.ac.uk) with a Wiki and associated open email list have been set up. • aim to encourage widest possible collaboration with other projects. • In medium-long term expect tools from DCC Research activities e.g. Annotation