1 / 22

The Library of Congress Cooperative Web Archiving Project

The Library of Congress Cooperative Web Archiving Project. November 4, 2009. Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown University. Agenda. LC’s Web archiving program Overview of the Cooperative Project

vanig
Download Presentation

The Library of Congress Cooperative Web Archiving Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Library of Congress Cooperative Web Archiving Project November 4, 2009 Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown University

  2. Agenda • LC’s Web archiving program • Overview of the Cooperative Project • Featured Partner: Georgetown University • Lessons Learned

  3. Library of Congress Web Archives: loc.gov/lcwa

  4. LC Collections: over 130 TB • US National Elections—2000, 2002, 2004, 2006, 2008 • Iraq War 2003 • September 11 2001 & September 11 Remembrance 2002 • Olympics 2002 • Congress—106th, 107th , 108th , 109th, 110th, • Supreme Court Nominations • Legal Blawgs • Papal Transition • Overseas Operations: Indian and Indonesian Elections • Case Studies: health care, terrorism, visual image content, organizational Web sites, Crisis in Darfur, “single site” http://www.loc.gov/webarchiving/projects.html

  5. Organizational Structure WEB ARCHIVING TEAM In the Office of Strategic Initiatives (OSI). We are project managers and technical staff focused on capture, tools, and permissions. CURATORS/RECOMMENDING OFFICERS In Library Services, Congressional Research Service, and the Law Library pick the collections and what URLs to archive, and research who to contact for permission. INFORMATION TECHNOLOGY OFFICE and TECHNICAL ARCHITECTURE TEAM Also in OSI. Supports Wayback and Web Curator Tool development, Repository development and Data Transfers. Contractors are also used in this area. BIBLIOGRAPHIC ACCESS MODS records are created in Library Services: the Network Development & MARC Standards Office & Acquisitions & Bibliographic Access staff do the cataloging.

  6. Collaborations and Partnerships • Early collections: Election 00 and 02, September 11 • End of Term Project • Hurricane Katrina Archive • IIPC – upcoming Olympics Collection • NDIIPP Partners • K-12 Web Archiving • Cooperative Archive-IT projects

  7. Problem • Web content that will be important for future research is disappearing before it can be collected • Identification of sites, and review of captured sites, is labor-intensive; LC staff are stretched thin • Outside institutions may not have resources/budgets for collecting web sites

  8. Cooperative Archive-IT Project Concept • Enlist Library Services subject experts to identify international and national high-value collecting areas, with a focus on foreign countries experiencing volatile political situations • Enlist Library Services subject experts to identify scholarly centers, or partner institutions, with recognized expertise in the collecting areas, to assist in the collection and preservation of important at-risk materials • Prioritize collecting areas/centers of expertise (7 priority areas selected)

  9. Goals • To enable institutions outside the Library to gain experience creating Web site collections • To extend the network of NDIIPP partners working to identify and collect high value, at-risk Web materials • To develop subject areas collections that could become part of the Library’s collections in the future, and • To broaden the understanding of issues related to the development of curated collections of Web content.

  10. Library of Congress agreed to: • Establish and fund an Archive-It account for the partner for up to one year (with possible extension); • Provide support as needed; • Provide subject matter expertise as requested by the partner; • Invite partner institutions to at least one conference at the Library (if funding is available); • Maintain a second copy of the harvested content.

  11. Each Center Was Asked To: • Identify high risk, high value web sites for their area, and use Archive-It to harvest the sites; • Document their selection criteria and provide it to the Library; • Document issues, lessons learned, etc. related to their web collecting; • Participate in a conference with Library experts and other participants (if scheduled).

  12. Featured Partner: Georgetown UniversityBelarus, Moldova, Ukraine Collection • Proposed by LC Curator: Grant Harris • Aim: the web capture of fragile websites from Belarus, Moldova, and Ukraine, to include selected government websites, opposition parties, ethnic and religious groups, elections, and security issues.

  13. Lessons Learned • Finding good partners was KEY - partners should be committed and really “get” the concept of web archiving and archiving primary source materials • Crawling ALL of Twitter – not so good. • Confusion over LC’s own web archiving program vs. this project

  14. Lessons Learned • Collaborative collection building is a good thing • New partnerships formed • New ways for our curators to get engaged with web archiving • LC might not have been able to archive some content collected on our own (permissions, staff time, etc.)

  15. Next Steps • Three partners collecting (at least) for another year: ELO, Georgetown, and Stanford • Focus on description and access: George Washington University/Russian Elections • Future: Data transfer to LC

  16. For more information • LC Web Archiving: http://www.loc.gov/webarchiving/ • LCWA: http://loc.gov/lcwa/ • National Digital Information and Infrastructure Preservation Program: http://www.digitalpreservation.gov/ • Georgetown’s Archive-IT collections: http://archive-it.org/public/partner?id=168

  17. Questions? • Abbie Grotke abgr@loc.gov • Grant Harris grha@loc.gov • Jennifer Long longj@georgetown.edu

More Related