1 / 18

NDIIPP Web Preservation Partnerships and Strategies

Learn about NDIIPP's web preservation strategies, partnerships, and challenges in preserving born-digital and "at-risk" web content. Discover the collaborative initiatives and collection strategies shaping the future of web archiving.

luissmith
Download Presentation

NDIIPP Web Preservation Partnerships and Strategies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LC Perspective : Preservation Partnerships Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005

  2. Born Digital “At-Risk” Web Sites http://www.loc.gov/minerva/collect/elec2000 http://www.loc.gov/minerva/collect/sept11

  3. Take Actions that are NDIIPP Strategic Direction • Catalytic • Invest in existing strengths • Collaborative • Engage partners in areas of mutual interest and expertise • Iterative • Learn by doing • Strategic • Broad spectrum of balanced short-term & investments

  4. Web of projects NARA GPO LC Web Projects UIUC IA Preservation Partners IIPC AIHT NDIIP CDL States Initiative

  5. Library of Congress Web Archiving Strategy • Collaborate with partners working on the same preservation issues • Develop collection strategies to leverage available resources • Learn by doing

  6. Collaborate with partners working on the same preservation issues • Membership in the International Internet Preservation Consortium (IIPC) • Cooperative projects with NDIIPP Preservation Partners • California Digital Library • University of Illinois at Champaign-Urbana • Technical information sharing with other US government agencies • Government Printing Office • National Archives and Records Administration

  7. Develop collection strategies to leverage available resources • Collect thematically both by crawling and by acquiring collections gathered by others Learn by doing • Case studies and regular collection of theme-based collections • Participate in tools development with IIPC • Archive Ingest & Handling Project

  8. Challenges of collecting from the Web • Characteristics of the resource--dynamic, deep, linked • Intellectual property laws and regulations • Tension of preservation vs access goals • Degree of alignment with current collection policies for other media • Curation strategy • Tools for identification and selection • Tools for collection, curation, and archiving of large web collections

  9. Average Web Collection • Begins with a theme or event • Usually does not include commercial sites • Starts with a list of about 200 urls • Is crawled by vendor • Yields about 1 TB of data per month • Has a frequency of once a week

  10. Web Collections to date at LC • Event-based • US National Elections—2000, 2002, 2004 • War in Iraq • September 11 • Public Policy Topics • Health Care • Legislative Branch • Terrorism • 26 TB

  11. AIHT is a first test of proposed NDIIP preservation architecture. The test is conducted with a common data set. George Mason University 9/11 Archive Phase I tests ingest and data handling in local systems. Phase II tests export and import between institutions. Phase III explores format migration. Archive Ingest & Handling Test

  12. GMU 9/11 Archive Participants exchange archive Participants demonstrate capabilities

  13. Participants • Old Dominion University, Department of Computer Science • Stanford University Libraries & Academic Information Resources • The Johns Hopkins University, Sheridan Libraries • Harvard University Library

  14. George Mason University 9/11 Archive: Breakdown by File Types • 57,450+ files • 12GB • Originally stored in • a Linux environment

  15. Goals of AIHT • Gain practical experience with multiple institutions • Document transfer and ingest processes for multiple systems • Determine next set of tasks for developing interfaces between layers and institutions

  16. Status of AIHT • All phases completed. • Imports focused on technical assessment of archive and developing tools to examine the archive • Exports included METS and MPG21 DID objects • Migrations included transforms to JPG2000, TIFF, and some exploration of html to xml and avi to mpg • Full report expected by early summer.

  17. For more information…. • NDIIPP Technical Architecture version 0.2 http://www.digitalpreservation.gov • International Internet Preservation Consortium http://netpreserve.org/about/index.php • MINERVA: Mapping the INternet Electronic Resources Virtual Archive http://www.loc.gov/minerva/

  18. Martha Anderson NDIIP Program Officer Office of Strategic Initiatives The Library of Congress Washington, DC mande@loc.gov

More Related