150 likes | 273 Views
Towards smart storage for repository preservation services. Steve Hitchcock , David Tarrant, Adrian Brown 1 , Ben O’Steen 2 , Neil Jefferies 2 and Leslie Carr Preserv 2 Project School of Electronics and Computer Science, University of Southampton 1 The National Archives, Kew
E N D
Towards smart storage for repository preservation services Steve Hitchcock, David Tarrant, Adrian Brown1, Ben O’Steen2, Neil Jefferies2 and Leslie Carr Preserv 2 Project School of Electronics and Computer Science, University of Southampton 1The National Archives, Kew 2Oxford University Library Services @iPRES 2008: The Fifth International Conference on Preservation of Digital Objects, London, 29-30 September 2008
Three-stage strategy for keeping your data safe • Ability to move data freely, easily and instantly • OAI, ORE, Atom • Reliable, trusted large-scale storage • Open Storage • Risk profiling: invoke a range of selectable services • Smart storage
Set up by institutions of higher education and research to manage and disseminate their digital intellectual outputs. IRs are a special type of Web site, typically based on some repository software that presents a database of records pointing to the objects deposited. The Preserv 2 project is investigating the provision of preservation services for IRs. IRs in flux Uncertainty in terms of target content - published papers, theses, research data, teaching materials - policy, rights, even locus of content and responsibility for long-term management. OAI-ORE (Object Reuse and Exchange) effectively frees the data from being captive to repository software. Commercial repository services, from software-specific services to digital library services or more general 'cloud' or network storage services. About institutional repositories Photo: Flickr/cpikas
IRs are • Open source repository softwares • Open access content • Open archives using OAI-PMH to share data with e.g. discovery services. • Open repositories, using OAI-ORE enables the easy movement of data between different types of repository software Photo: Flickr/Rightee
A new ‘open’How open storage supports preservation services • Open storage, large-scale storage devices based on open source software • Open storage averts the need for a repository layer to access first-class objects – these are objects that can be addressed directly • In turn, these digital objects can be distributed and/or replicated over many open storage platforms. • In turn, able to select storage with built-in preservation support • Resilient storage platforms may be viable for preservation services aimed at multiple repositories • E.g. Sun Microsystems STK5800 (codenamed Honeycomb) • Google Repository
Smart storage • Smart storage combines an underlying passive storage approach with the intelligence provided through services. • The key to realising smart storage is to enable the services to communicate and share information with the digital content sources they may be acting on. This is done through machine-level application programming interfaces (APIs) and protocols.
APIs, interfaces and the Web architecture • Major services on the Web, such as deploy their own simple, but different, APIs, e.g. • Google Maps • Within the repository community, SWORD (Simple Web-service Offering Repository Deposit) • Open storage platforms such as Sun's STK5800 and the Amazon Simple Storage Service (S3) • To take advantage of open storage, repositories have to be able to talk to these services through their APIs.
Smart storage example: format services • Preservation methods affecting formats can be classified in three stages (‘seamless flow’): • Format identification and characterization (which format?) • Preservation planning and technology watch (format risk and implications) • Preservation action, migration, etc. (what to do with the format) • Format-based services tend to be ad hoc processes for which some tools are available • E.g. PRONOM-DROID from The National Archives (UK) • PRONOM is an online registry of technical information, such as file format signatures • DROID is a downloadable file format identification tool that applies these signatures) • These and other tools could be used in a more coordinated manner.
Smart storage DROID: scheduling/history • Scheduling interface controls when a DROID classification needs to be performed. • Preserv 2 has developed a scheduling service that uses the Darwin Calendar Server and iCalendar format. • Provides a powerful scheduling service with many clients already available - Apple iCal, Mozilla Sunbird, and others - that can read and interpret the files so that past and future events can be reviewed.
Smart storage DROID: OAI-PMH interface • An OAI-PMH interface to open storage discovers the latest objects to have been deposited and which are ready for format classification. • Could also be performed by simpler RSS or Atom-based methods. • The interface has since been expanded to allow export of OAI-ORE resource maps in both RDF and Atom formats.
Smart storageDROID: implementation E.g. iCal, Outlook, Sunbird DROID-OAI harvester DROID Scheduler Open storage Schedule event Calendar server OAI-PMH Repository History Is event done? url, date Messaging Atom? Web server HTTP Stores results of DROID events User interface Get results of event Machine interface, API Implemented To be implemented
Risk profiling • The scheduler will invoke actions based on the results of scanning by DROID allied to decision-making tools that use intelligence from planning and technology watch tools, such as • PRONOM, • Plato preservation planning tool from the EC-funded Planets project, • and others. Photo:Flickr/yourbartender
Summary: smart storage in the storage scheme How smart storage addresses current storage issues – see full paper
Storage can become smarter • Openness, in its various forms, the ability to move data freely and easily, needs to be supplemented by decision-making that can be automated based on the supplied intelligence and information. • In this way, open storage can become ‘smarter’. http://preserv.eprints.org/ Thanks to