1 / 35

MetaArchive Distributed Digital Preservation Workshop

Join us for a comprehensive overview of distributed digital preservation networks, content management, costs, organizational agreements, and digital preservation basics, shared through informative sessions at the Robert W. Woodruff Library, Emory University.

devonk
Download Presentation

MetaArchive Distributed Digital Preservation Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MetaArchiveDistributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

  2. Day One Overview 8:30 AM - 9:00 AM Light Breakfast and Welcome 9:00 AM - 10:30 AM Session 1. Overview of Distributed Digital Preservation Networks, M. Halbert 10:30 AM - 10:45 AM Break 10:45 AM - 12:15 PM Session 2. Content Management, C. Jannik and G. MacMillan 12:15 PM - 1:15 PM Lunch 1:15 PM - 2:45 PM Session 3. Costs and Operational Considerations, M. Halbert and K. Skinner 2:45 PM - 3:00 PM Break 3:00 PM - 4:30 PM Session 4. Organizational Agreements, D. Buttler and K. Skinner 4:30 PM - 4:45 PM Wrap Up DDP Workshop -Session 1

  3. Purposes of this Workshop • Foster discussion concerning distributed digital preservation strategies • Share information and perspectives acquired in the course of the MetaArchive NDIIPP project • Provide information and training for institutions seeking to build or join distributed digital preservation networks based on the LOCKSS software. DDP Workshop -Session 1

  4. Introductions – Who We All Are • Please introduce yourself • Say where you are from • Mention any particular things that you hope to get out of this workshop, and any other expectations you may have • Identify any particular topics you hope we will spend time discussing DDP Workshop -Session 1

  5. Learning Objectives for this Session • Review day one workshop sessions • Overview of some digital preservation basics • Reasons to establish or join a network • Models of network organization • Defining partner/member responsibilities • Overview of MetaArchive and LOCKSS DDP Workshop -Session 1

  6. Overview of Some Digital Preservation Basics

  7. The New Field of Digital Preservation Cultural heritage organizations are rapidly expanding their digitization programs in an effort to provide better access to collections. As these digitization efforts go forward, and as an increasing number of born-digital acquisitions are made, there are concomitant needs for preservation of these materials. • The DigCCurr 2007 Conference was hosted in April 2007 by the School of Information and Library Science at the University of North Carolina at Chapel Hill in an explicit effort to define the new field of Digital Curation. • The Consultative Committee for Space Data Systems has of necessity created many working standards for preservation of digital information. One of the most notable standards was the Reference Model for an Open Archival Information System (OAIS) which provided a broad vocabulary for discussing digital archives systems and processes • The National Digital Information Infrastructure and Preservation Program (NDIIPP) is the congressionally chartered national program to digitally preserve our national heritage • The Digital Preservation Management Workshop hosted by Cornell University from 2003-2006 was an effort to collate and share relevant best practices and documentation from a large number of emerging projects and efforts related to digital preservation. • In the UK, groups such as the Digital Curation Centre and the Digital Preservation Coalition have been formed to “foster joint action to address the urgent challenges of securing the preservation of digital resources in the UK and to work with others internationally to secure our global digital memory and knowledge base.” DDP Workshop -Session 1

  8. The Data Loss Problem DDP Workshop -Session 1

  9. The Data Loss Problem (cont.) DDP Workshop -Session 1

  10. The Data Loss Problem (cont.) DDP Workshop -Session 1

  11. The Data Loss Problem (cont.) DDP Workshop -Session 1

  12. The Data Loss Problem (cont.) From NDIIPP Website on the Importance of Digital preservation (http://www.digitalpreservation.gov/importance/): DDP Workshop -Session 1

  13. National Digital Information and Infrastructure Preservation Program (NDIIPP) Commentary • Technology has so altered our world that most of what we now create begins life in a digital format. • The artifacts that tell the stories of our lives no longer reside in a trunk in the attic, but on personal computers or Web sites, in e-mails or on digital photo and film cards. • The flip side to the ease with which we are able to create digital content is the complexity of preservation and long-term retrieval of this content. • We must contend with issues relating to hardware and software compatibility; long-term storage; organization of files for ease of search and retrieval; media quality; disaster recovery; and integrity of original data DDP Workshop -Session 1

  14. Making Our Digital Heritage a Top Priority When we consider the ways in which the American story has been conveyed to the nation, we think of items such as the Declaration of Independence, Depression-era photographs, television transmission of the lunar landing and audio of Martin Luther King's "I Have a Dream" speech. Each of these are physically preserved and maintained according to the properties of the physical media on which they were created. Yet, how will we preserve these essential pieces of our heritage? • Web sites as they existed in the days following Sept. 11, 2001, or Hurricane Katrina? • What about Web sites developed during the national elections? • Executive correspondence generated via e-mail? • Web sites dedicated to political, social and economic analyses? • Data generated via geographical information systems, rather than physical maps? • Digitally recorded music or video recordings? • Web sites that feature personal information such as videos or photographs? • Social networking sites? • Should these be at a greater risk of loss, simply because they are not tangible? • The content of digital archives at cultural heritage institutions, created with scarce resources in a time of great change DDP Workshop -Session 1

  15. The Gap in Digital Preservation Programs • 66% of cultural heritage institutions (academic libraries, archives, art museums, public libraries, and other similar kinds of institutions) report that no one is responsible for digital preservation activities • 30% of all archives have been backed up one time or not at all Source: 2005 NEDCC Survey by Bishoff and Clareson DDP Workshop -Session 1

  16. Reasons to Establish or Join a DDP Network

  17. Backups versus Digital Preservation What differentiates a schedule for data backups from a digital preservation program? • Backups are tactical measures. Backups are typically stored in a single location (often nearby or collocated with the servers backed up) and are performed only periodically. Backups are designed to address short-term data loss via minimal investment of money and staff time resources. Backups are better than nothing, but not a comprehensive solution to the problem of preserving information over time. • Digital preservation is strategic. A digital preservation program entails a geographically dispersed set of secure caches of critical information. A true digital preservation program will require multi-institutional collaboration and at least some ongoing investment to realistically address the issues involved in preserving information over time. DDP Workshop -Session 1

  18. What is Digital Preservation? • Digital Preservation refers to the management of digital information over time. • Unlike the preservation of paper or microfilm, the preservation of digital information demands ongoing attention. This constant input of effort, time, and money to handle rapid technological and organisational advance is considered the main stumbling block for preserving digital information beyond a couple of years. • Digital preservation can therefore be seen as the set of processes and activities that ensure the continued access to information and all kinds of records, scientific and cultural heritage existing in digital formats. http://en.wikipedia.org/wiki/Digital_preservation DDP Workshop -Session 1

  19. Secure and Distributed Cache Networks Why are the characteristics of geographically distribution and security so important? This strategy maximizes survivability of content in both individual and collective terms: • Security reduces the likelihood that any single cache will be compromised. • Distribution reduces the likelihood that the loss of any single cache will lead to a loss of the preserved content. By creating a collaborative network for secure and distributed preservation, a group can also work together on more complex issues such as format migration. DDP Workshop -Session 1

  20. Case Study from the Chirographic (Handwritten) Era: The Nag Hammâdi Library • Collection of early Coptic texts discovered near the town of Nag Hammâdi in 1945 • Had been buried in the 4th Century CE when censored • Only extent copies of core early Gnostic scholarship • Survived 15 centuries because they were part of a secure, distributed chirographic network DDP Workshop -Session 1

  21. Shared archiving Fails without a Pre-coordinated Digital Preservation Network in Place The NDIIPP Archive Ingest and Handling Test (AIHT): • Designed to document methods for preserving digital cultural materials, & identify areas that require further research • Participants tested five different preservation systems • Encountered many unexpected incompatibilities because of different systems • Realization that much of the cost in preserving digital material is in coordinating the organizational and institutional imperatives of preservation, and not the technological costs of storage space DDP Workshop -Session 1

  22. Both Technical Networking and Organizational Networking are Required • A single cultural heritage organization is unlikely to have the capability to operate several geographically dispersed and securely maintained servers • Collaboration between institutions on technological solutions is essential • Similarly, inter-institutional agreements must be put in place or there will be no commitment to act in concert over time The increased number and diversity of those concerned with digital preservation—coupled with the current general scarcity of resources for preservation infrastructure—suggests that new collaborative relationships that cross institutional and sector boundaries could provide important and promising ways to deal with the data preservation challenge.  These collaborations could potentially help spread the burden of preservation, create economies of scale needed to support it, and mitigate the risks of data loss. - The Need for Formalized Trust in Digital Repository Collaborative Infrastructure NSF/JISC Repositories Workshop (April 16, 2007) DDP Workshop -Session 1

  23. Defining Partner/Member Responsibilities

  24. Institutional and Consortial Roles • Preservation Sites are entities responsible for the ongoing activity of preserving digital content. At a minimum, every preservation site must include responsible staff and a node server of the relevant preservation network. Preservation sites collectively comprise a preservation network. • Development Sites are responsible for technical development of the computer systems that enable the preservation network. Obviously, development sites may also be preservation sites and/or contributing sites. • A Preservation Network is composed of all preservation sites that work together to preserve at-risk digital content. • Contributing (Content) Sites are institutions that need to preserve digital content, and therefore decide to contribute digital content into the preservation network. The preservation network acts for the common good to preserve the at-risk content submitted by the contributing sites. Contributing sites may also be preservation sites. DDP Workshop -Session 1

  25. Individual Roles • Selectors are staff that identify and prioritize content to be preserved. They will most often be knowledgeable concerning the content of an institution’s digital archives, and may have been the same individuals that originally created or acquired the archives. • System Administrators are staff members that maintain individual preservation node servers of the relevant preservation network. • Data Wranglers are programmers and other technically adept workers that prepare local digital archives for ingestion into a preservation network. • Program Managers are leaders that accept responsibility for coordinating the activities of a digital preservation network. NOTE: All of the above roles may overlap in creative ways! DDP Workshop -Session 1

  26. Models of Network Organization Different Ways of Creating or Joining Digital Preservation Networks

  27. Dedicated Network Create a Dedicated Preservation Network: • Provides the greatest organizational control • You can set up the rules for the network • Requires greatest up-front investment to implement DDP Workshop -Session 1

  28. Strategic Alliance Build onto an Existing Preservation Network: • Takes advantage of previous investments by others • Requires understanding the rules of existing network and abiding by them • Still requires capital investment in infrastructure DDP Workshop -Session 1

  29. Piggyback Ride Arrange Contribution Strategy to an Existing Preservation Network: • No capital investment in infrastructure required • Maximum advantage from previous investments by others • Requires abiding by rules of existing network • Requires convincing the existing network to preserve your stuff; will likely entail fees DDP Workshop -Session 1

  30. Network Security Factors What level of security and control over access to your data do you need? • Do you have sensitive assets that require access controls? If so, you may need a dedicated network in which you control access to the preservation nodes, or at least be able to join a network which provides such access assurances. • Do you have some flexibility in adapting to other infrastructures and security policies? If so, it may be simplest to join and build your preservation nodes onto an existing network. The requirements may be readily acceptable. • Do you have relaxed or no security/access expectations? If so, you may simply want to piggyback off an existing network and depend on their good graces. DDP Workshop -Session 1

  31. Decisions on Degrees of Security More security and access assurances drive up the required costs of a preservation network: • Extra costs may very well be justified! The entire point of a preservation network is long term security for you digital content. • Strategic alliances can make a lot of sense. They leverage your resources, but still give you ownership of a portion of the infrastructure. • If you have no infrastructural capacity, and little or no funding, a piggyback ride is better than nothing! DDP Workshop -Session 1

  32. Overview of MetaArchive and LOCKSS

  33. MetaArchive A dedicated preservation network for digital archives established under the auspices of and with funding from the National Digital Information and Infrastructure Preservation Program (NDIIPP): • Based on LOCKSS technology, but a separate network with high capacity nodes • Highly distributed geographically across multiple states • Node servers are very secure, with a variety of extra security hardening measures added to each preservation node • Memoranda of Understanding between participating sites concerning commitment to maintain each other’s data security and network integrity • Motivation to preserve partners digital archives is based on signed agreements and commitment to the preservation network • Available for others to join, both to build onto or to piggyback on • Active development community, committed to ongoing exploration of distributed preservation technologies, digital Curation tools, and format migration methods • Fee structure to join as members or to piggyback on DDP Workshop -Session 1

  34. LOCKSS A dedicated preservation network for online journals, established with funding from the Mellon Foundation and new funding from the NDIIPP: • The pioneering leader in distributed digital preservation • Very highly distributed geographically across the world, with hundreds of sites • Available for others to join, both to build onto or to piggyback on • Fee structure for membership • No signed agreements between sites; individual nodes may preserve content or withdraw at will • Motivation to preserve content is based on interest by members in long-term access to online journal content to which they subscribe • Active development community, with new initiatives with publishers (CLOCKSS) and many other technical advancement directions DDP Workshop -Session 1

  35. Q&A Discussion

More Related