100 likes | 112 Views
Explore the robust technologies developed for automated ingestion and long-term preservation of digital information at the University of Maryland. Recent accomplishments include ACE software, FOCUS registry, and enhancements to PAWN platform. These tools enable monitoring, auditing, and preservation services on diverse collections. Discover the innovative integrity auditing service, FOCUS format curation service, and the challenges and surprises encountered during the project.
E N D
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate Students: Sang Song and Muluwork Geremew Institute for Advanced Computer Studies University of Maryland, College Park
Research Objectives • Development of tools and technologies for: • Automated Distributed Ingestion – flexible platform for Producer-Archive Interactions • Management of Preservation Processes – Monitoring, Integrity Auditing, and Preservation Services. • Evaluation and demonstration of tools on widely different collections.
Recent Major Accomplishments • ACE (Auditing Control Environment): a policy-driven software environment to continually verify the integrity of an archive’s holdings. • FOCUS – a scalable, and secure registry for persistent information and services applied to formats. • Substantial enhancements to PAWN – Producer-Archive Workflow Network software platform.
Client ACE-IMS ACE-AM 3rd Party Auditor ACE – Overview Hash (obj) obj Integrity Token
Basic Ideas • Integrity auditing service that can interoperate with any archiving architecture. • Active (periodic) and user-triggered auditing. • Time-stamped certificates that enable the verification of the integrity of the object throughout its lifetime – auditable record of every transformation. • Cost effective, scalable, and based on rigorous techniques.
FOCUS: FOrmat CUration Service • Maintains persistent information on digital formats, services, and applications to access and manipulate them. • Accessible either • Directly through LDAP • Or indirectly through SOAP (Web Services) Web Service Agent Format Registry SOAP LDAP
Answer to Question #1 • Biggest Surprise – None but a number of small surprises such as: • OAIS may be too general to provide a useful framework?? • Significant differences for automated ingestions regarding the push and pull models. • Not at all clear which communities will be able to handle or afford wide area distributed infrastructure.
Answer to Question#2 • What have you done that you never thought you would? • Confuse my graduate students!! Trying to explain: authenticity of an archive’s holdings (the object is what it claims to be!!); ensuring access to data after hundreds of years without having any idea about how the technology will evolve over the next ten or twenty years!
Answer to Question #3 • How is the area of your project changed? A Lot and Not Much: • Hardware (processor and storage) is changing very quickly – as expected. • Web technologies are more mature and more widely used – as expected. • Grid technologies did not progress as much as had been expected! • Very little work regarding preservation services.
Conclusion • Three major pieces of software: ACE, FOCUS, and PAWN. • Interoperable with any archiving architecture • Scalable, secure, and platform independent • Continued development of preservation services.