1.05k likes | 1.29k Views
IT Research Challenges in Digital Preservation. Andreas Rauber Department of Software Technology and Interactive Systems Vienna University of Technology http://www.ifs.tuwien.ac.at/~andi. Overview. Why do we need Digital Preservation? Digital Preservation Projects in Europe
E N D
IT Research Challenges in Digital Preservation Andreas Rauber Department of Software Technology and Interactive Systems Vienna University of Technology http://www.ifs.tuwien.ac.at/~andi
Overview • Why do we need Digital Preservation? • Digital Preservation Projects in Europe • IT-oriented Challenges in Digital Preservation • Some Digital Preservation Research at TUWIEN • Conclusions
Why do we need Digital Preservation? • Digital Objects require specific environment to be accessible : • Files need specific programs • Programs need specific operating systems (-versions) • Operating systems need specific hardware components • SW/HW environment is not stable: • Files cannot be opened anymore • Embedded objects are no longer accessible/linked • Programs won‘t run • Information in digital form is lost(usually total loss, no degradation) • Digital Preservation aims at maintaining digital objects authentically usable and accessible for long time periods.
Strategies for Digital Preservation Strategies (grouped according to Companion Document to UNESCO Charterhttp://unesdoc.unesco.org/images/0013/001300/130071e.pdf) • Investment strategies: • Standardization, Data extraction, Encapsulation, Format limitations • Short-term approaches: • Museum, Backwards-compatibility, Version-migration, Reengineering • Medium- / long-term approaches: • Migration, Viewer, Emulation • Alternative approaches: • Non-digital Approaches, Data-Archeology • No single optimal solution for all objects
Migration • Transformation into different format, continuous or on-demand (Viewer) • Wide-spread adoption • Possibility to compare to un-migrated object • Immediately accessible • Unintended changes, specifically over sequence of migrations • Cannot be used for all objects • Requires continuous action to migrate
Emulation • Emulation of hardware or software (operating system, applications) • Concept of emulation widely used • Numerous emulators are available • Potentially complete preservation of functionality • Object is rendered identically • Object is rendered identically • Requires detailed documentation of system • Requires knowledge on how to operate current systems in the future • Complex technology • Emulators must be emulated or migrated themselves • Emulators potentially erroneous/incomplete
Digital Preservation • Affects all domains • Cultural heritage • eGovernment • Primary data: Sensor data, experiment data • Industry: production processes, workflows, monitoring • Medical, Insurance/Banking, • Society: photos, communications • Test: • Trying to repeat / verify “old” experiments • Problems with • Data Management: original test data, parameters, preprocessing,… • Code: compilability, change of libraries/functionality • interpretability of results, know-how
Digital Preservation • Is a complex task • Requires a concise understanding of the objects, their intellectual characteristics, the way they were created and used and how they will most likely be used in the future • Requires a continuous commitment to preserve objects to avoid the „digital dark ages“ • Requires a solid, trusted infrastructure and workflows to ensure digital objects are not lost • Is essential to maintain electronic publications, research data, … accessible • Will become more complex as digital objects become more complex
Overview • Why do we need Digital Preservation? • Digital Preservation Projects in Europe • IT-oriented Challenges in Digital Preservation • Some Digital Preservation Research at TUWIEN
Overview • Digital Preservation Projects in Europelarge number, small selection provided below • DPE: Digital Preservation Europe, EU, FP6 • Caspar: Cultural, Artistic and Scientific Knowledge for Preservation, Access and Retrieval • Planets: Preservation and Long-term Access Networked Services: • Shaman: Sustaining Heritage Access through Multivalent Archiving • LIWA: Living Web Archives • Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It • IT-oriented Challenges in Digital Preservation • Some Digital Preservation Research at TUWIEN • Conclusions
DPEFP6 Coordinating Actionhttp://www.digitalpreservationeurope.eu
What is DPE? • FP6 Coordinating Action, • Digitalpreservationeurope (DPE) intends to create a coherent platform for proactive cooperation, collaboration, exchange and dissemination of research results and experience in the preservation of digital objects • Digital Preservation: ensuring long-term accessibility of digital objects • Mitigating the risk of a “digital dark age” • http://www.digitalpreservationeurope.eu Vision
Two macro objectives: • to foster collaboration and synergies among on-going projects and existing initiatives across the ERA [repositories and audit and certification tools] • to raise up awareness on digital preservation challenges among different user communities [different level of awareness on the subject and its strategic significance] Objectives
DPE Activities • Range of activities to foster research and take-up in digital preservation • Research Roadmap • Digital Preservation Challenge • Researcher and Practitioner Exchange • DPE Videos Activities
Preservation Research Roadmap • The Roadmap aims at contributing to the planning of our future R&D in Digital Preservation by means of different actions: • Analysing the state of the art in Digital Preservation research and already existing research agendas on a global level; • Researching the needs and demands from the point of view of the Digital Preservation user communities and their leading experts; • Researching the needs and demands of future markets for technology and service providers Research Roadmap
DPE Recommended Research • Restauration • Conservation • Collection and repository management • Preservation as risk management • Preserving the interpretability and functionality of digital objects • Collection cohesion and interoperability • Automation in preservation • Preserving the context • Storage technologies Research Roadmap
DPE Challenge • Promotion of innovation in DP • Targeted at students • Main Goal: Provide access to and make digital objects useable • Open to participants world-wide • Submission deadline: May 30 2008 • http://www.digitalpreservationeurope.eu/challenge • Different tasks, eg. • Assessment of Submission by an International Panel of Experts in the field • Access Data in a Legacy Client-Server System • Proprietary File Format • Preservation of Multimedia Art DP Challenge
Raising Awareness of DP Issues • Experts & Practitioners: Briefing Papers, Seminars • General Public: little awareness, everybody afected • DPE Videos:series of short cartoons highlighting DP issuesaimed at non-expertstrying to communicate challenges in simple styleVideos available on YouTube:http://www.youtube.com/user/wepreserve DPE Videos
Overview • Digital Preservation Projects in Europelarge number, small selection provided below • DPE: Digital Preservation Europe, EU, FP6 • Caspar: Cultural, Artistic and Scientific Knowledge for Preservation, Access and Retrieval • Planets: Preservation and Long-term Access Networked Services: • Shaman: Sustaining Heritage Access through Multivalent Archiving • LIWA: Living Web Archives • Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It • IT-oriented Challenges in Digital Preservation • Some Digital Preservation Research at TUWIEN • Conclusions
CASPAR • How can digital data still be used and understood in the future when systems, software, and everyday knowledge continues to change? This is the CASPAR challenge. • The CASPAR project is mainly based on the OAIS standard ISO:14721:2003 • Its Architecture is defined for • Managing key concepts of the OAIS reference model • Supporting main functionality identified in the OAIS functional model • CASPAR aims to define and implement interfaces and functionally independent components
Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved How to guarantee digital information may be accessed and understood in the future? How to guarantee retrieval of Archival Information? How to guarantee intelligibility of digital information within heterogeneous Designated Communities? Non-maintainability of essential hardware, software or support environment may make the information inaccessible How to guarantee preservation actors are informed about change events? How to guarantee appropriate actions are undertaken to preserve Archival Information against change events? Preservation Issue 1
Preservation Issue 3 The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity How to guarantee an adequate integrity and identity for any Archival Information? Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future How to guarantee an adequate security access with the proper rights to any resource and functionality within an Archive? The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future How to guarantee a proper information package management within and Archive? How to guarantee long-time preservation maintenance of any information package?
Overview • Digital Preservation Projects in Europelarge number, small selection provided below • DPE: Digital Preservation Europe, EU, FP6 • Caspar: Cultural, Artistic and Scientific Knowledge for Preservation, Access and Retrieval • Planets: Preservation and Long-term Access Networked Services: • Shaman: Sustaining Heritage Access through Multivalent Archiving • LIWA: Living Web Archives • Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It • IT-oriented Challenges in Digital Preservation • Some Digital Preservation Research at TUWIEN • Conclusions
The Planets project 4-year research and technology development project co-funded by the European Union Addresses core digital preservation challenges Started June 2006 with €15m budget Coordinated by the British Library 16 partners national libraries and archives leading technology companies research universities Builds on strong digital archiving and preservation programmes
Planets partners The British Library National Library, Netherlands Austrian National Library State and University Library, Denmark Royal Library, Denmark National Archives, UK Swiss Federal Archives National Archives, Netherlands
Planets partners Tessella Plc IBM Netherlands Microsoft Research Austrian Research Centers GmbH Hatii at University of Glasgow University of Freiburg Vienna University of Technology University of Cologne
ThePlanetsteam All Staff Meeting, February 2007
Planets Architecture PreservationPlanning Services Digital Content Preservation Action Services Organisational Context Test Bed:evaluation and validation services External Context Characterisation Services Technical Environment Interoperability Framework
Preservation Action Transform content Pluggable infrastructure for third-party migration tools Transform environment Dioscuri:Modular emulation of the full hardware/software environment Universal Virtual Computer (UVC):provides a layered durable approach to emulation Preservation Action Tools registry XML language for describing preservation action tools
Preservation Characterisation Characterisation framework Unifies tools for identifying file formats and extracting object properties Characterisation registry Based on the file format registry PRONOM eXtensible CharacterisationLanguages (XCL) Family of XML languages for characterising digital objects Comparator verifies effects of preservation actions
Infrastructure and Testbed Interoperability Framework providescommon basis JBoss Application Server Logging, Security Services Registry services User management and Single-Sign-On Planets Testbed Controlled environment for the execution of experiments Accumulated experience base collected in registry
Preservationplanning Collection profiling services Technology watch services Risk assessment of digital objects Preservation planning methodology Tool support: Plato, the Planning Tool
Preservationplanning Evaluating preservation strategies Variety of solutions and tools exist Each strategy has unique strengths and weaknesses Requirements vary across settings Decision on which solution to adopt is complex Documentation and accountability is essential Preservation planning assists in decision making Evaluation of strategies on representative sample content according to specific requirements
Overview • Digital Preservation Projects in Europelarge number, small selection provided below • DPE: Digital Preservation Europe, EU, FP6 • Caspar: Cultural, Artistic and Scientific Knowledge for Preservation, Access and Retrieval • Planets: Preservation and Long-term Access Networked Services: • Shaman: Sustaining Heritage Access through Multivalent Archiving • LIWA: Living Web Archives • Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It • IT-oriented Challenges in Digital Preservation • Some Digital Preservation Research at TUWIEN • Conclusions
Securing Communication with the future Research & Development Project in Digital Preservation
SHAMAN Objectives • SHAMAN will establish an Open Distributed Resource Management Infrastructure Framework enabling Grid-based Resource Integration, that is firmly grounded in a conceptual and technical reference architecture. • SHAMAN will develop and integrate technologies to support Contextual and Multivalent Archival and Preservation Processes to enable proper preservation management and policies. • SHAMAN will support Managing of Future Requirements by safeguarding Interoperability with Future Environments based on evidence gathered through the characterisation of digital objects, their (metadata) context and their preservation environment, resulting in the evolution of preservation policies.
SHAMAN Outputs SHAMAN will deliver a next-generation Digital Preservation framework, with three prototypical applications. • scientific publishing in libraries and documents in governmental archives • digital objects used in industrial design and engineering • data resources used in e-Science applications
SHAMAN Consortium SHAMAN Collaborators:
Overview • Digital Preservation Projects in Europelarge number, small selection provided below • DPE: Digital Preservation Europe, EU, FP6 • Caspar: Cultural, Artistic and Scientific Knowledge for Preservation, Access and Retrieval • Planets: Preservation and Long-term Access Networked Services: • Shaman: Sustaining Heritage Access through Multivalent Archiving • LIWA: Living Web Archives • Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It • IT-oriented Challenges in Digital Preservation • Some Digital Preservation Research at TUWIEN • Conclusions
FP7 project funded by the European Commission • Started in Feb 2008 • EA, L3S, Max Planck, Hungarian accademy of science, Hanzo Archives, libraries and archives
Approach Example: Semantic Evolution Detection • Time-Specific Term ContextsLeningrad@1970 (Soviet Union, Hermitage, Moscow, Neva River, Baltic Sea,…)Saint Petersburg@2009 (Russia, Hermitage, Moscow, Neva River, Baltic Sea,…) • Across-Time Semantic Similarity compares term contexts and shows high similarity between Leningrad@1970 and Saint Petersburg@2009 • Term Coherence analyzes term contexts and shows that Saint Petersburg@2009 and Hermitage@2009 are commonly used together
Approach • Good query reformulations contain query terms similar to the original query terms that are commonly used together • ExamplesSaint Petersburg Museum Leningrad Museum ✔Leningrad Cowboys Saint Petersburg Cowboys ✖ iPod Hearing Damage Walkman Hearing Damage ✔ disabled / handicapped / special needs
Overview • Digital Preservation Projects in Europelarge number, small selection provided below • DPE: Digital Preservation Europe, EU, FP6 • Caspar: Cultural, Artistic and Scientific Knowledge for Preservation, Access and Retrieval • Planets: Preservation and Long-term Access Networked Services: • Shaman: Sustaining Heritage Access through Multivalent Archiving • LIWA: Living Web Archives • Keep-it: Kultur, eCrystals, EdShare (and NECTAR) - Preserve It • IT-oriented Challenges in Digital Preservation • Some Digital Preservation Research at TUWIEN • Conclusions