220 likes | 414 Views
Data Archiving and Networked Services. DANS & Database Archiving - MIXED - SDFP René van Horik Program Manager OPF Hackaton, Copenhagen, 7-9 February 2012. Outline. DANS – Data Archiving & Networked Services Data Archiving @ DANS MIXED: a DANS Software project
E N D
Data Archiving and Networked Services DANS & Database Archiving- MIXED- SDFPRené van HorikProgram ManagerOPF Hackaton, Copenhagen, 7-9 February 2012
Outline • DANS – Data Archiving & Networked Services • Data Archiving @ DANS • MIXED: a DANS Software project • SDFP Specification (Standard Data Format for Preservation) • Hackspirations
1. DANS • DANS => “Data Archiving & Networked Services” • Mission: Durable access to research data • Project oriented • Ca. 40 people • DANS is an institute of KNAW and NWO • http://www.dans.knaw.nl
2. Data Archiving @ DANS • EASY = Trusted Digital Repository of DANS -> http://easy.dans.knaw.nl • Data Seal of Approval -> http://www.datasealofapproval.org • 25.000 datasets / > 1.000.000 files / 10 data archivists • DANS guarantees accessibility over time of “Preferred File Formats” • Audit & Certification of DANS digital archiving infrastructure • APARSEN Network of Excellence (EU project)
3. MIXED-project • MIXED = Migration to Intermediate XML for Electronic Data • Implementation of “Smart Migration” strategy • MIXED Software
Smart Migration strategy • Conversion upon ingest of specific kinds of data formats (such as spreadsheets and databases) to an intermediate generic format expressed in the XML data format. Upon dissemination the file is converted from this generic format into a current format of choice. • Assumption: XML is a durable file format • Smart migration can be considered as a combination of normalisation and migration
MIXED Software • Generic framework with conversion plug-ins • Several interfaces possible: web console (see below), command line tool, web service, … • Open source libraries (see below) • Building block in preservation workflow
Libraries for reading and writing obsolete binary file formats • dBase http://dans-dbf-lib.sourceforge.net/ • DataPerfect http://dans-dp-lib.sourceforge.net/ (Open Source)
4. SDFP part 1 “Standard Data Format for Preservation” • Defines the features of the intermediate XML data format. “Wrapper” • Contains sets of XML schemas for various significant data kinds and builds on existing XML representations of file formats (e.g. SIARD / ODF) • MIXED: concentrates on tabular data (spreadsheets and databases) • New data kinds can be added
SDFP Part 2 • Formalisation of SDFP during MIXED project raised a lot of discussion • To what extent do we have to replicate existing standards? (SIARD / ODF) • What about the provenance metadata? • Who is our “designated community” • We need a specification that can serve as a basis for the development services such as MIXED (but also for other services) • => Data Dictionary – Metadata for the SDFP Data Format
The SDFP data format • The SDFP data format is optimized for representing the content and structure of a number of data kinds in a durable way • Data kind: type of file format that has a structure optimized for specific functions • SDFP Data Dictionary is available -> Designed to facilitate interoperability between systems, services, and application software to support long-term management of and continuing access to data kinds
5. Hackspirations • Workingwith the MIXED plugins / Libraries • Workingwith the SDFP data dictionary • Suggestions
Reference René van Horik and Dirk Roorda, Migration to Intermediate XML for Electronic Data (MIXED): Repository of Durable File Format Conversions, in: The International Journal of Digital Curation, Issue 2, volume 6, 2011. (http://www.ijdc.net)
Thank you for your attention For more information please contact Rene.van.horik@dans.knaw.nl