400 likes | 491 Views
BNSC Report Fall 2007. David Giaretta. CASPAR Consortium. Integrated project Total spend 16MEuro. http://www.casparpreserves.eu. …CASPAR. Strongly based on OAIS Passed 1 st year EU review. CASPAR Aims.
E N D
BNSC Report Fall 2007 David Giaretta
CASPAR Consortium Integrated project Total spend 16MEuro http://www.casparpreserves.eu
…CASPAR • Strongly based on OAIS • Passed 1st year EU review
CASPAR Aims • Produce tools and techniques to support digital preservation and make it easier to share the cost • must be relatively easy to use • must have a low “buy-in” in terms of effort required for adoption • must avoid requiring wholesale change of everyone else’s systems • must be decentralised and reproducible so that it can live on after the formal end of the CASPAR project • must be “preservable” • must be open: open source, open standards • Cannot do everything • Working closely with other projects
Validation • How can we judge any proposed solution? • CASPAR validation metrics: • Theoretic underpinning • Testbed scenarios addressing real issues • No “hand-waving” – use what is there now • Accelerated lifetime tests • Hardware and Software • Environment • People • Improved “trustability”/”certifiability” Live a long time Evidence - not proof
Rep • Info CASPAR information flow architecture Virtualisation
Orchestration Gap Manager RegRep Data Curator User RepInfo toolkit Data Source Application Repository Registry INFRASTRUCTURE ELEMENTS
Preservation Aware Storage and Preservation DataStores • Preservation Aware Storage - The storage component of a digital preservation system that has built-in support for both bit preservation and logical preservation. • Presevation DataStores (PDS) is anew OAIS-based preservation-aware storage. It offloads functionality to the storage layer • Decrease the probability of data loss • Simplify the applications • Provide improved performance and robustness • Utilize locality properties • Compute data intensive functions internally e.g. fixity • Provide better support for links among objects
PDS Architecture Preservation Web Services AIP Preservation DataStore Ingest, Access, Administration, … Preservation Engine Layer Applications • Layered approach • Prototype based on open standards • OAIS, XAM, OSD • Generic gradual mapping from logical to physical object • Independent of physical storage • Independent of stored data type • Scalable XAM Layer Object/File Layer backend
PDS Architecture Preservation Web Services AIP Preservation Engine Layer Preservation DataStore RepInfo Mgr PDI Mgr Preservation WSDL Migration Mgr Placement Mgr Ingest, Access, Administration, … Preservation Engine Applications XAM API XAM Layer XAM Library VIM API VIM API XAM to FS XAM to OSD WAS CE posix I/O sockets File System HL OSD + Object Store web service Security Admin HL OSD Object Layer backend
Preservation DataStores • Preservation DataStores are OAIS-based preservation aware storage • API covers different options for ingest and access, configure policies and enables updates of AIPs and PDS code • Prototype implements mainly ingest and access using web services • References • “Towards OAIS-Based Preservation Aware Storage - A White Paper“. • http://www.haifa.il.ibm.com/projects/storage/datastores/public.html • “The Need for Preservation Aware Storage - A Position Paper". • ACM SIGOPS Operating Systems Review, Special Issue on File and Storage Systems, Volume 41, Issue 1 (Jan 2007), pp 19-23. • “Preservation DataStores: Architecture for Preservation Aware Storage”, to appear in 24th IEEE Conference on Mass Storage Systems and Technologies (MSST), 2007. • Web site - http://www.haifa.il.ibm.com/projects/storage/datastores/index.html
Data Value Vector Image 3-D data Virtualisation - building up data types… Spectrum Earth Observation image Astronomical image Time Series
Content dependent components • Representation Information tools • Structure • EAST • DRB • DFDL • Virtualisation assistant • Semantics • RDF editors • RDFSuite • Terminology capture • Software • UVC • Hardware emulators • Trust, Authenticity & Provenance tools • Certification assistant • PREMIS • Packaging tools • XFDU toolkit Use existing tools where applicable Develop new tools as needed and resources allow
OAIS Information Model Capture in UML diagrams • Add “obvious” methods • get/set for sub-components e.g. we know AIP has PDI so need get/setPDI • Add “best guess” methods • Iterators over contents • May need to change
Summary • The Conceptual Model is based on OAIS and works out some implications • It suggests area of Research • Intelligibility • Structure • Virtualisation • Authenticity • It leads into the Architecture which is • Broadly applicable • Is useful not just for Preservation but also interoperability • Note - Registry/Repository of Representation Information • http://registry.casparpreserves.eu • http://registry.dcc.ac.uk
Digital Curation Centre • DCC Development closely linked to CASPAR • Other linked JISC funded projects: • SCARP • Significant properties of software • …may be others
The need for Trustable Repositories • Task Force on Archiving of Digital Information (1996) declared, • “a critical component of digital archiving infrastructure is the existence of a sufficient number of trusted organizations capable of storing, migrating, and providing access to digital collections.” • “a process of certification for digital archives is needed to create an overall climate of trust about the prospects of preserving digital information.” • A recurring request in many subsequent studies and workshops
Trusted Digital Repositories • Invited group, hosted by Research Library Group (RLG) • Concerned with organisational and financial issues • Trusted Digital Repositories: Attributes and Responsibilities (TDR) • http://www.rlg.org/legacy/longterm/repositories.pdf
Critique of TRAC • Closed process • Single review of draft document • Many changes based on unpublished “test audits” • Underplays “understandability” • Important for data • Assumed not to be important for “documents” • Simple list – • Do ALL boxes have to be ticked? • What does a “tick” mean anyway? • Link to other standards • ISO 17799/27001 for security (overlap with TRAC section C) • ISO 9000 – say what you do and do what you say • but impractical to demand multiple independent audits
ISO process status • New group set up with the primary aim of producing an ISO standard • Repository Audit and Certification (RAC) • OPEN process • Wiki open to all • Mailing list open to all • Virtual meetings normally every week • See http://wiki.digitalrepositoryauditandcertification.org • Into ISO via CCSDS – same route as OAIS • Some organisational/procedural changes in CCSDS • Currently a Birds of a Feather (BoF) group • To demonstrate adequate support for the work • Subsequently should become a Working Group • Documents agreed by the WG will then be reviewed by CCSDS and more broadly via international ISO review process
Current status • Reviewing and comparing • TRAC • NESTOR • DCC documents • Do we need another ISO standard? • Could we could simply add to existing standards e.g. ISO 27001 • The view is that ISO 27001 CANNOT be modified adequately • It’s view of Information is too limited • Started drafting a straw man document • Taking TRAC and add concepts from other docs
Key Issues • How to get from a checklist to an international accreditation/ certification system? • Evidence – short term • Evidence – long term • The real crunch! • Quantification • The marking system • Levels of audit? • External review • Internal maturity
The Market • Transparency • Trustable? • certified by whom? • to what level? • what evidence? • for what Designated Community • relevant/sensible? • What cost?
Links • RAC group Wiki: • http://wiki.digitalrepositoryauditandcertifiation.org • TRAC document • http://www.crl.edu/PDF/trac.pdf • Digital Curation Centre • http://www.dcc.ac.uk • CASPAR project • EU project on digital preservation – Science, Culture and Arts data • Infrastructure, tools and detailed case studies – what does one need to actually “understand” the data? • http://www.casparpreserves.eu
Alliance for Permanent Access • Members: • Science and Technology Facilities Council • Koninklijke Bibliotheek • Deutsche Nationalbibliothek • Max Planck Gesellschaft • International Association of Scientific, Technical and Medical Publishers • European Space Agency, ESRIN • Fernuniversität in Hagen • European Organization for Nuclear Research • Georg-August-Universitat Gottingen Stiftung Oeffentlichen Rechts • European Science Foundation, • Centre National d’Etudes Spatiales, • Centre Informatique National de l’Enseignement Supérieur, • UK Joint Information Systems Committee, • British Library • National Archives of Sweden
Alliance status • First stage – fairly informal sign-up • Preparing for Conference in Nov • More formal framework next year
PARSE bid • Consortium is a sub-group of the Alliance • EU bid • Aims at E-Infrastructure for Preservation • Roadmap • Survey of what is in place and planned • Gap Analysis • Impact Analysis tool
Other opportunities • NSF solicitation, entitled Sustainable Digital Data Preservation and Access Network Partners (DataNet) • http://www.nsf.gov/pubs/2007/nsf07601/nsf07601.pdf • informational meeting for prospective Principal Investigators will be held 10 am to noon, Tuesday, November 6, 2007, Room 595 NSF Stafford II building, Arlington, Virginia. • www.nsf.gov/dir/index.jsp?org=OCI