1 / 22

Data Archiving and Networked Services

Data Archiving and Networked Services. DANS & Database Archiving - MIXED - SDFP René van Horik Program Manager OPF Hackaton, Copenhagen, 7-9 February 2012. Outline. DANS – Data Archiving & Networked Services Data Archiving @ DANS MIXED: a DANS Software project

brody
Download Presentation

Data Archiving and Networked Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Archiving and Networked Services DANS & Database Archiving- MIXED- SDFPRené van HorikProgram ManagerOPF Hackaton, Copenhagen, 7-9 February 2012

  2. Outline • DANS – Data Archiving & Networked Services • Data Archiving @ DANS • MIXED: a DANS Software project • SDFP Specification (Standard Data Format for Preservation) • Hackspirations

  3. 1. DANS • DANS => “Data Archiving & Networked Services” • Mission: Durable access to research data • Project oriented • Ca. 40 people • DANS is an institute of KNAW and NWO • http://www.dans.knaw.nl

  4. 2. Data Archiving @ DANS • EASY = Trusted Digital Repository of DANS -> http://easy.dans.knaw.nl • Data Seal of Approval -> http://www.datasealofapproval.org • 25.000 datasets / > 1.000.000 files / 10 data archivists • DANS guarantees accessibility over time of “Preferred File Formats” • Audit & Certification of DANS digital archiving infrastructure • APARSEN Network of Excellence (EU project)

  5. 3. MIXED-project • MIXED = Migration to Intermediate XML for Electronic Data • Implementation of “Smart Migration” strategy • MIXED Software

  6. Smart Migration strategy • Conversion upon ingest of specific kinds of data formats (such as spreadsheets and databases) to an intermediate generic format expressed in the XML data format. Upon dissemination the file is converted from this generic format into a current format of choice. • Assumption: XML is a durable file format • Smart migration can be considered as a combination of normalisation and migration

  7. (Smart Migration) Snapshot

  8. (Smart migration) Timeline

  9. MIXED Software • Generic framework with conversion plug-ins • Several interfaces possible: web console (see below), command line tool, web service, … • Open source libraries (see below) • Building block in preservation workflow

  10. SIARD standard used!

  11. Libraries for reading and writing obsolete binary file formats • dBase http://dans-dbf-lib.sourceforge.net/ • DataPerfect http://dans-dp-lib.sourceforge.net/ (Open Source)

  12. 4. SDFP part 1 “Standard Data Format for Preservation” • Defines the features of the intermediate XML data format. “Wrapper” • Contains sets of XML schemas for various significant data kinds and builds on existing XML representations of file formats (e.g. SIARD / ODF) • MIXED: concentrates on tabular data (spreadsheets and databases) • New data kinds can be added

  13. SDFP Part 2 • Formalisation of SDFP during MIXED project raised a lot of discussion • To what extent do we have to replicate existing standards? (SIARD / ODF) • What about the provenance metadata? • Who is our “designated community” • We need a specification that can serve as a basis for the development services such as MIXED (but also for other services) • => Data Dictionary – Metadata for the SDFP Data Format

  14. The SDFP data format • The SDFP data format is optimized for representing the content and structure of a number of data kinds in a durable way • Data kind: type of file format that has a structure optimized for specific functions • SDFP Data Dictionary is available -> Designed to facilitate interoperability between systems, services, and application software to support long-term management of and continuing access to data kinds

  15. SDFP Data Dictionary

  16. SDFP Groups of Data Elements

  17. 5. Hackspirations • Workingwith the MIXED plugins / Libraries • Workingwith the SDFP data dictionary • Suggestions

  18. Reference René van Horik and Dirk Roorda, Migration to Intermediate XML for Electronic Data (MIXED): Repository of Durable File Format Conversions, in: The International Journal of Digital Curation, Issue 2, volume 6, 2011. (http://www.ijdc.net)

  19. Thank you for your attention For more information please contact Rene.van.horik@dans.knaw.nl

More Related