210 likes | 319 Views
SCIDIP-ES services and toolkits. David Giaretta. Preserving digitally encoded information. Ensure that digitally encoded information are understandable and usable over the long term Long term could start at just a few years Need to do something because things become “unfamiliar” over time
SCIDIP-ES services and toolkits David Giaretta
Preserving digitally encoded information • Ensure that digitally encoded information are understandable and usable over the long term • Long term could start at just a few years • Need to do something because things become “unfamiliar” over time • But the same techniques enable use of data which is “unfamiliar” right now
Open Archival Information Systems (OAIS) Reference Model (ISO 14721) 2002, updated 2011 • The OAIS Reference Model • is concerned with the Long Term preservation of information • provides vital concepts that are necessary to preserve digitally encoded information • provides testable mandatory responsibilities • provides useful vocabulary and check-lists • is widely used in the design and description of archives and libraries. • forms the basis of a number of follow-on standards which are being developed. Representation Information is itself Information and hence there is a network – a kind of recursion. This recursion stops when it matches the Designated Community’s Knowledge Base The information that maps a Data Object into more meaningful concepts. Examples include software, ontologies, formal data descriptions, human readable documentation, web pages ... AIP: a set of information that has, in principle, all the qualities needed for permanent, or indefinite, Long Term Preservation of a designated Information Object • OAIS CONFORMANCE • Mandatory responsibilities • Negotiate for and accept appropriate information from information Producers. • Obtain sufficient control of the information provided to the level needed to ensure Long Term Preservation. • Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided, thereby defining its Knowledge Base. • Ensure that the information to be preserved is Independently Understandable to the Designated Community. In particular, the Designated Community should be able to understand the information without needing special resources such as the assistance of the experts who produced the information. • Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, including the demise of the archive, ensuring that it is never deleted unless allowed as part of an approved strategy. There should be no ad-hoc deletions. • Make the preserved information available to the Designated Community and enable the information to be disseminated as copies of, or as traceable to, the original submitted Data Objects with evidence supporting its Authenticity. OAIS Information Model – key concepts needed for conformance Long Term Preservation: The act of maintaining information, Independently Understandable by a Designated Community, and with evidence supporting its Authenticity, over the Long Term. “Open Archival Information System (OAIS), now adopted as the “de facto” standard for building digital archives" NSF: Cyberinfrastructure Vision for 21st Century Discovery Available free from http://www.ccsds.org , for more information see http://www.alliancepermanentaccess.org/membership/member-resources/oais OAIS Functional Model – useful terminology
Information model: Representation Information The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)
described by delimited by Archival Information Package Packaging Information Package Description derived from identifies Content Information Preservation Description Information further described by Interpreted using * Data Object Representation Information 1 Interpreted using Reference Information Provenance Information Context Information Fixity Information Physical Object Digital Object Structure Information Semantic Information Other Representation Information Access Rights Information adds meaning to 1 1...* Bit
PARSE.Insight: Indication of distribution of researchers’ responses Researchers: 1/3 Europe 1/3 USA 1/3 rest of world Overall: 44% Europe 33% USA 23% rest of world Incomplete sample of respondees
Sharing of data (R) How open is your data?
Sharing of data (R) Which constrains do you see in making data open?
Threats to preservation • The ones we trust to look after the digital holdings may let us down. • The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future. • Loss of ability to identify the location of data. • Access and use restrictions (e.g. Digital Rights Management) may not be respected in the future. • Evidence may be lost because the origin and authenticity of the data may be uncertain. • Lack of sustainable hardware, software or support of computer environment may make the information inaccessible. • Users may be unable to understand or use the data e.g. the semantics, format or algorithms involved.
Threats to preservation (R) The ones we trust to look after the digital holdings may let us down The current custodian of the data may cease to exist Loss of ability to identify the location of data Access and use restrictions may not be respected in the future Evidence may be lost Lack of sustainable hardware/software Users may be unable to understand or use the data
Threats to preservation (R) Users may be unable to understand or use the data e.g. the semantics, format or algorithms involved.
CASPAR • Infrastructure to support preservation of all types of digitally encoded information. • Supports maintenance of Representation Information Networks. • simple, re-implementable interfaces • no single point of failure • decentralised • heterogeneous • asynchronous • CASPAR in brief • Prototyped discipline independent Infrastructure components • Carried out fundamental research based on and contributing to OAIS • Developed toolkits for Representation Information, Authenticity, Digital Rights etc • Provided substantial collection of evidence, validated by the designated communities, supporting their effectiveness for digital preservation by: • accelerated lifetime tests using changes in hardware, software, environment and knowledge base of designated communities • using many types of digitally encoded information – data and documents from science (STFC, ESA), cultural heritage (UNESCO) and contemporary performing arts (CIANT, INA, IRCAM, Univ Leeds) Toolkits to create all components of AIPs Test scenarios vs Threats to digital preservation For more information see http://www.alliancepermanentaccess.org/current-projects/caspar and http://www.casparpreserves.eu
Persistent ID resolver RepInfo Registry Authenticity tools Processing Context Certification Orchestration/Brokering Knowledge Gap Manager Persistent ID resolver RepInfo Registry Authenticity tools Processing Context Certification Orchestration/Brokering Knowledge Gap Manager Storage Compute Resource Local Authentication Local Authorisation WAN LAN Router Switch Cable Interconnects Gateways Management WAN LAN Router Switch Cable Translators Thesauri Cross-references Discipline repositories Storage Compute Resource Local Authentication Local Authorisation Resource Registries Process ID Scheduler Shibboleth Repositories Users Automated systems Repositories Users Automated systems Discipline repositories Translators Thesauri Cross-references FUTURE • Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved • Non-maintainability of essential hardware, software or support environment may make the information inaccessible • The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity • Access and use restrictions may not be respected in the future • Loss of ability to identify the location of data • The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future • The ones we trust to look after the digital holdings may let us down
Preservation Infrastructure • Services which are not centralised, no single point of failure • Supplements for existing archives to improve their ability to preserve their holdings • Do not replace everything – small additions • better certification result • Simple services which can be maintained into the future
SCIDIP-ES Archives Domain independent Infrastructure counters threats identified by PARSE.Insight based on CASPAR prototypes Will help archives with certification Storage Service RepInfo Registry Service Gap Identification Service Orchestration Service User applications • SCIDIP-ES in brief • Upgrade CASPAR prototype components into scalable, robust e-infrastructure components to support digital preservation of all types of digital objects • decentralised, heterogeneous, asynchronous, no single point of failure • Persistent, simple re-implementable interfaces • critical mass of users: • Earth science as initial focus • Other disciplines via APA E-INFRASTRUCTURE External Access/Use Services Cloud Storage Preservation Strategy Toolkit Finding Aid Toolkit Process Virtualisation Toolkit Certification Toolkit ISO Certification Organisation External PI services Persistent ID i/f Service TOOLKITS APARSEN will produce a common vision to allow a coherent approach DIGITAL PRESERVATION RESEARCH needed to create the tools needed to create the “metadata” used by the e-infrastructure and user applications. Tools may be domain dependent. Must include Rep. Info. Network of the metadata SCIenceData Infrastructure for Preservation – with focus on Earth Science Led by ESA. Currently in negotiation with EU. For more information see http://www.alliancepermanentaccess.org/current-projects/scidip-es
RepInfo toolkit, Packager and Registry – to create and store Representation Information. In addition the Orchestration Manager and Knowledge Gap Manager help to ensure that the RepInfo is adequate. Registry and Orchestration Manager to exchange information about the obsolescence of hardware and software, amongst other changes. The Representation Information will include such things as software source code and emulators. Authenticity toolkit will allow one to capture evidence from many sources which may be used to judge Authenticity. Digital Rights and Access Rights tools allow one to virtualise and preserve the DRM and Access Rights information which exist at the time the Content Information is submitted for preservation. Persistent Identifier system: such a system will allow objects to be located over time. Orchestration Manager will, amongst other things, allow the exchange of information about datasets which need to be passed from one curator to another. The Audit and Certification standard to which CASPAR has contributed will allow a certification process to be set up.
AIP (Archival Information Package) Q5: Please explain by means of a graphic a potential distribution of the SCIDIP-ES infrastructure with respect to geographical locations (for example for storage), and with a mapping to the OAIS model. REGISTRY PACKAGING FINDING AIDS GAP MGR AUTHENTICITY/ ANNOTATION ORCHESTRATION REPINFO TOOLBOX DATA STORE DATA STORE DAMS DRM 5
Summary – SCIDIP-ES services and toolkits • Demonstrated demand for these services • Demonstrated effectiveness across domains • Maintainable