390 likes | 411 Views
Issues in Human Rights Web Archiving. Robert Wolven Columbia University Libraries. Libraries have a mission to build, organize, and preserve coherent collections for research There’s a great deal of human rights-related content on the web Much of it is not currently collected by libraries
E N D
Issues in Human RightsWeb Archiving Robert Wolven Columbia University Libraries Human Rights Archives and Documentation, CHRDR Conference 4-6 October 2007
Libraries have a mission to build, organize, and preserve coherent collections for research • There’s a great deal of human rights-related content on the web • Much of it is not currently collected by libraries • Something should be done about that
A great deal of content exists only online • There’s a high risk that some will disappear • Libraries and archives are custodians of our cultural heritage • Libraries and archives should lead in preserving “at risk” content
Web Archiving as Preservation • Small footprint in organization • The Hoover effect • Haphazard library collections • Ineffective access
Much of it is not … collected • Refugees International • 40 documents on web site • 0 in Columbia collections • 10 listed in OCLC • 1 held by more than 2 libraries • No library holds more than 3
Web Archiving Issues • Ways and Means • Selection policies • Permissions – and Obligations • Organization and Integration • Presentation and Uses • Sharing the Costs and Benefits • Organizational Transformation
Center for Research Libraries’ Political Communications Web Archive Project Project website: http://www.crl.edu/content/PolitWeb.htm Final report: http://www.crl.edu/PDF/PCWAFinalReport.pdf
Web Archiving Tools • Archive-It (Internet Archive) • http://www.archive-it.org • PANDAS (National Library of Australia) • http://pandora.nla.gov.au/pandas.html • OCLC Digital Archive • http://www.oclc.org/digitalarchive/default.htm
“I only want to download text/html and nothing else. Can I do it?” • You can … add a filter that excludes all filters that end in other than 'html|htm', etc., or, if you want to instead look at document mimetypes, you can Add a ContentTypeRegExpFilter filter as a midfetch filter to the http fetcher. • [From Heritrix FAQ http://crawler.archive.org/faq.html#user-heritrix]
Policy Technology • Full web site or selected content? • Preserve relationships, “look and feel”? • All file types? • How often?
“Corrigendum Changes have been made to the report on "Muslims in the European Union – Discrimination and Islamophobia" after it was printed. Following pages are replacing the Annex page in the EN and FR version of the report. PDF” From the European Union Agency for Fundamental Rights website
Selection by Type of Agency • Governmental • International • Academic • Educational
Selection by Focus • Global, regional, local • Ethnicity, religion, gender, age • Legal, medical, economic • Crisis-driven
Selection by Content • Fixed documents: • Case studies • Position papers • Topical reports • Press releases • Bulletins, newsletters • Activity reports
Selecting by Content • Non-textual (image, sound, video) • Ephemeral, dynamic content • Redundant (?) content: • Languages, formats • Republished or unique?
Rights and Obligations • Permissions: ask or assume • Rights: • Dark archive • Closed archive • Conditional exposure • Obligations: • Parallel (mirror) access • Free, reliable access • Perpetual access
Organization and Integration … or, now that we have it, how do we know what we’ve got? How do other agencies know what’s been done? How do researchers find it?
“From 1 March the European Monitoring Centre on Racism and Xenophobia (EUMC) became the EU Agency for Fundamental Rights (FRA). The content on the website is being gradually transformed to reflect the scope, activities and products of the new Agency.”
Integrating Access • Through Authority Control • Through controlled vocabulary • Through series
Integrating Collections • With print – in the catalog • With archives – in finding aids • With digital collections – in …
Use • Internal organization and navigation • Indexing • Analytical tools • Citation: pedigree and persistent links
Sharing Costs and Benefits • Centralized Collaboration • Distributed Collaboration • Disclosure (at what level of detail) • Exposure (to the web; OAI-PMH)
Transformative Action • Concept of “collecting” • Modes of selection • Bridging communities of practice