1 / 31

VRC: Preservation Risk Management for Web Resources

This project aims to develop a model for research libraries to identify, respond to, and mitigate risks to the integrity and longevity of web resources. It includes stages of identification, analysis, appraisal, strategy, detection, and response. The VRC Toolkit provides tools for server-level monitoring, web crawling, and site management.

ctriana
Download Presentation

VRC: Preservation Risk Management for Web Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VRC: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2004

  2. VRC Funding • Part of a 4(5)-year NSF-funded project • supported by the Digital Libraries Initiative, Phase 2 (Grant No. IIS-9905955, the Prism Project) • Also partially funded by a grant from The Andrew W. Mellon Foundation • Political Communications Web Archiving http://www.crl.edu/content/PolitWeb.htm • For updates: • http://irisresearch.library.cornell.edu/VRC/

  3. Current Team Anne R. Kenney, Advisor Nancy Y. McGovern, Project Manager Richard Entlich, Sr. Researcher William R. Kehoe, Technology Coordinator Ellie Buckley, Digital Research Specialist Erica Olsen (recent) Carl Lagoze, CIS PI

  4. Research Scope see, "Preservation Risk Management for Web Resources: Virtual Remote Control in Cornell's Project Prism" by Anne R. Kenney, Nancy Y. McGovern, Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette in DLib Magazine, January 2002 http://www.dlib.org/dlib/january02/kenney/01kenney.html

  5. Virtual… • because VRC develops models to represent essential features of selected Web sites • that enable ongoing monitoring over time • to identify, respond to, and mitigate potential risks to the site integrity and longevity

  6. Remote… • because VRC is intended for use by cultural heritage institutions • interested in the longevity of Web resources • residing on remote servers – not owned or managed by the monitoring institution

  7. Control… • because at the most proactive end of the VRC approach • a monitoring organization may act to protect another organization's resources • by agreement or implicit consent • through notification and/or action

  8. Purpose • Develop a model for research libraries (adaptable to other contexts) • Support spectrum from passive monitoring to active capture • Lifecycle support: selection to capture • Understand nature of Web resources • Promulgate good practice

  9. Types of Web Resources Two types of initiatives for monitoring and/or capture of: • Web-based publications [Web site as a means] • All of (or a subset of) a Web site consisting of pages within a boundary defined by a URL (or a portion of one) [Web site as an end] (VRC)

  10. Nature of Risks Two perspectives on Web-based risk: • potential liability of an institution based upon the content of its Web site, or a Web site for which it is responsible • potential threats to the integrity and longevity of a Web resource (VRC)

  11. Types of Risks Include: • technological obsolescence • security weaknesses and breaches • human-error in developing/maintaining sites • organizational issues; benign neglect • power and technology failures • inadequate backup and secondary systems

  12. Risk Factors • Organizational Context • Combination of indicators • Monitoring (change/loss over time) • Triggers (events, organizational, upgrades) • Degradation of site management indicators

  13. VRC Stages • Identification • Analysis • Appraisal • Strategy • Detection • Response

  14. Human – Tool Scenario 1. Identification • Human: identify Web resources of interest • Tool: verify list, expand list 2. Analysis • Tool: crawl sites, generate characterizations • Human: accept/revise characterizations 3. Appraisal • Human: define/review attributes of value • Tool: support appraisal, capture results

  15. Human – Tool Scenario 4. Strategy • Human: develop/review strategies • Tool: plot appraisals, compile strategies 5. Detection • Human: define risk parameters • Tool: identify/assess risks; propose responses 6. Response • Tool: propose risk response based on rules; automatic response for some risk categories • Human: monitor automated responses; select response based on recommended actions

  16. Contextual Layers

  17. Server-level Monitoring • Potential multi-site impact • Server vulnerabilities put site content at risk • deletion or modification • Patches and new versions of Microsoft IIS and Apache server released frequently • Apache http server 1.3 security updates • to version 1.3.26 on June 18, 2002 • to version 1.3.27 on October 3, 2002

  18. Server-level Monitoring

  19. VRC Toolkit • Identify tools for each stage (adopt, adapt, define, devise) • Leverage existing; apply to longevity • Analyze steps - automated and manual • Formalize protocol • Provide a framework to map existing, plug gaps with developments

  20. VRC Toolkit Development steps: • extensive literature review • development of tool categories • definition of categories and test protocols • survey existing tools for evaluation • select representative for testing • highlight findings in category summaries

  21. Web Crawling • traversing Web sites via links • a capability common to most tools, but with different purposes and results • the VRC toolkit needs more than just Web crawlers

  22. Tool Categories Link checkers Web site monitors Web crawlers Site management Change Management Site Mapping (includes visualization)

  23. OAIS Issues • Pre-Ingest: Selection options • Ingest: Capture • vs. monitoring • Targets, level and frequency • Archival Storage: Formats • Access: Site(s) vs. Page(s) • AIP: Metadata issues

  24. Management Issues • frequency of capture – determined by • nature of sites/pages • events: technological, organizational • resources • well-informed crawling • valuable vs. archival

  25. Mandate • to fully document the site by capturing all changes to the pages/sites • to capture significant changes to pages/sites • to record periodic versions of the site • to capture one-time copy of pages/sites

  26. Current Activities • VRC Preservation Risk Management Program: • Map stages to tool requirements • Apply to potential organizational scenarios • Enable risk/response scenario development • Toolkit: • Revise and populate tool inventory • VRC Control Site

  27. Future Projects • Develop approach for building human sexuality collection: capturing Web blogs and other Internet communications • State Government Web site case study • Demonstrators for toolkit scenarios

  28. For Discussion What would the VRC approach have to address to be of interest, value, and/or potential impact for archivists and records managers?

More Related