1 / 21

Addressing the Challenge of Global Infrastructure Development for Data Curation

Explore solutions for funding, preservation, validation, and infrastructure in digital curation. Overcome obstacles through cost-sharing and commonalities. Address complexities in data management for future generations.

shull
Download Presentation

Addressing the Challenge of Global Infrastructure Development for Data Curation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Addressing the Challenge of Global Infrastructure Development for Data Curation David Giaretta

  2. Infrastructures for curation • Social / Legal / Financial / Organisational • Agreements / Trust / Standards • Costs/ Benefits/ Rewards • Technical components

  3. What is needed? MONEY

  4. Disincentives for curation: cost Budget available Future generations do NOT: • - Vote • - Pay taxes Money If cost of preserving old information increases… Time Need to show that costs will be contained

  5. Digital Preservation • Need to preserve information & knowledge – not just “the bits” • Documents, videos are rendered – simple? • Data – must be processed – in new ways - harder • Need to manage knowledge to keep archives alive through time • Preservation is a process, not a one-time event • Preservation is expensive – costs need to be shared • The alternative is money – endless supplies of money • Open Archival Information Systems Reference Model (ISO 14721) provides a general conceptual framework and terminology • (http://public.ccsds.org/publications/archive/650x0b1.pdf) • OPEN process – not just “Open Archives” Need more than just formats Preservation as the cornerstone of duration

  6. Things change/disappear How can we ensure that the information trapped in the “bits” remains understandable despite all these changes? • Software • Hardware • Environment • E.g. Network links to related information • People • What is “common knowledge” How can a digital curator even be aware of these changes?

  7. Validation Live a long time • How can we judge any proposed solution? • Proposed validation metrics: • Theoretic underpinning • Testbed scenarios addressing real issues • Accelerated lifetime tests • Hardware and Software • Environment • People • Improved “trustability”/”certifiability” Evidence - not proof

  8. Infrastructure • No organisation can do everything that is required for digital preservation forever • Need to share the cost/effort • Need to identify commonalities • None will be a perfect fit for all purposes

  9. Obstacles to Sharing the Effort • Not invented here • Differences terminology/ language • The “Rendered Object” vs Data divide • Lack of interoperability • Cannot force a “master plan” on everyone • Different timescales • Reluctance about long-term funding commitments • Fragmentation of effort • Made worse by incentives for branding

  10. Preservation Planning P R O D U C E R C O N S U M E R Descriptive Info. Data Management Descriptive Info. queries result sets Ingest Access orders SIP DIP AIP AIP Archival Storage Administration MANAGEMENT SIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package OAIS Functional Entities

  11. Information Object 1+ interpreted interpreted using Data Representation 1+ using Object Information Physical Digital Object Object 1+ Bit Sequence Information Model & Representation Information The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)

  12. Data… Level 2 GOME Satellite instrument data

  13. Unfamiliar information • Preservation • Digitally encoded information which must be usable and understandable • Unfamiliar because of separation in time • E-Science/GRID/CyberInfrastructure for data • Digitally encoded information which must be usable and understandable • Unfamiliar because of separation in discipline or location – even if created yesterday Support automated usage where possible

  14. E-Infrastructures • Doing preservation “right” produces gains in usage • Preservation useful for E-science/GRID • Underpins publication and re-use aspects of curation • Enables multi-disciplinary usage • Virtualisation allows one to deal with unfamiliar objects more easily • E-Infrastructures need components to support preservation – and gain interoperability • Additional types of Finding Aids also useful Double payback • Applicable in “GRID” context • usability now as well as later

  15. Commonalities – bit level • Storage • Networks

  16. Non-common aspects • Budgets • Decisions of what to preserve • Cost benefit analyses • Fancier access methods • Specific domain software • National legal aspects But can share best practice

  17. ….Commonalities • Persistent Identifier system • Registries of Representation Information – to allow understandability • More than formats • Tools & services for creating and preserving the preservation artefacts • Provenance, DRM, Access Control, Rep Info i(Structure, Semantics, software etc etc) • Trustability: Audit and Certification • Cost modelling • With appropriate domain parameterisation • Virtualisation to assist use in disciplinary software • Some aspects of access tools

  18. Rep Info CASPAR information flow architecture Virtualisation

  19. Alliance for Permanent Access • Part of strategy from Task Force for Permanent Access to the Records of Science • With Research programme outline • Aim to “align” digital curation activities in the members • Members of Alliance: • The European Science Foundation • European Space Agency • CERN • Max Planck Gesellschaft • Centre National d'Etudes Spatiales • Science and Technology Facilities Council • The British Library • Koninklijke Bibliotheek • Deutsche Nationalbibliothek • Joint Information Systems Committee • International Association of Scientific, Technical and Medical Publishers • National Archives of Sweden • Centre Informatique National de l'Enseignement Superieur • Digital Preservation Coalition • NESTOR • Perennisation des Informations Numeriques • Netherlands National Coalition for Digital Preservation

  20. Common Infrastructure elements • Preservability and persistence of the infrastructure in some sense • Preservation understandability and usability • Persistence of preservation artefacts • Ensure Persistent Identifier system is persistent • Just one system if possible • Virtualisation techniques for interoperability and persistence • Infrastructure should share what can be common e.g. Representation Information • Work with industry – e.g. more intelligent storage options • International Accreditation and Certification system • Brokerage systems to recognise finite lifetimes of organisations/projects • Prepare to hand over holdings in a way which is preseravable • Common discussion forum • international., non-badged, persistent • Fight fragmentation • Align existing local infrastructures Stimulate the Market PARSE.Insight will refine these ideas

  21. END

More Related