210 likes | 348 Views
WEB ARCHIVING IN THE BRITISH LIBRARY. John Tuck Head of British Collections February 2004. BRITISH LIBRARY: CONTEXT. Created by British Library Act 1972. National Library of the United Kingdom. Origins from 1753. One of world’s greatest research libraries.
E N D
WEB ARCHIVING IN THE BRITISH LIBRARY John TuckHead of British CollectionsFebruary 2004
BRITISH LIBRARY: CONTEXT • Created by British Library Act 1972. • National Library of the United Kingdom. • Origins from 1753. • One of world’s greatest research libraries. • 160 million collection items.
BRITISH LIBRARY: COLLECTION DEVELOPMENT • Building as completely as possible the UK national published archive - current and retrospective gap filling; print and electronic. • Collecting research-level English- language material published world-wide in the humanities, social sciences, STM. • Buying foreign-language material selectively • Material acquired through: legal deposit, voluntary deposit from publishers, purchase, donation, exchange.
LEGISLATION • Legal Deposit Libraries Act 2003: enabling legislation. • VDEP: Voluntary Deposit of Electronic Publications.
DOMAIN.UK • Six-month experiment to select and capture 100 UK web-sites, 2001. • audit change, loss, links, etc. • determine next steps.
DOMAIN.UK: Why? • Short-lived nature/changing content of many web-sites. • loss of information. • increasing reference to web-sites in research/scholarship.
DOMAIN.UK: Voluntary/Rights Cleared Approach • Voluntary. • Requiring explicit agreement of website publishers to take part in pilot. • No public access.
DOMAIN.UK: Selection • Websites of historical or cultural significance. • Cross-section of Dewey Decimal Classification.
DOMAIN.UK: Process • E-mail selected sites for approval and to check whether already archived. • Measure sites for links, size, change, etc. • Frequency of visits: every three weeks or more in some cases. • Supported by those sites approached. • Report recommended scaling up.
BRITISH LIBRARY WEB ARCHIVING PROGRAMME • Building on Domain.uk. • BL to play leading role in collecting UK web presence in partnership with other institutions nationally and internationally. • Selective approach.
BRITISH LIBRARY WEB ARCHIVING PROGRAMME contd. • Co-ordinate a snapshot of entire UK web presence at occasional intervals. • Achieve more regular capture of limited and well-defined range of sites. • Sites judged to be research-level, whether in terms of stated intentions of sites themselves or of potential to be primary resources for research.
WEB ARCHIVING PROGRAMME • Comprises a series of complementary projects and activities. • Based entirely on voluntary, rights-cleared basis pending secondary legal deposit legislation. • Aims to embed web archiving within the BL's overall collection development policy. • Aims to provide the infrastructure to collect, preserve and make accessible web-site material alongside material in other formats.
WEB ARCHIVING PROGRAMME STRANDS • Four main strands: • Definition of collection development policy. • UK Web Archiving Consortium. • International Internet Preservation Consortium. • Internet Archive: incunabula of the internet.
COLLECTION DEVELOPMENT • Appointment of Curator, Web Archiving. • Extension of policy defined for Domain.uk. • Sites of national, historical and cultural significance. • Research level now/in the future.
UK WEB ARCHIVING CONSORTIUM • Two-year project. • Six partners: BL (lead); National Library of Scotland, National Library of Wales, National Archives, Joint Information Systems Committee, Wellcome Library. • Plan to use PANDAS software developed by National Library of Australia. • Rights to use individual sites to be cleared with rights-holders.
UK WEB ARCHIVING CONSORTIUM contd. • Procurement exercise in process to recruit supplier to host service. • Intention to let contract in April 2004 and to be operational in summer 2004. • Sites to be made accessible to users. • Each partner to collect up to 500 sites per year, i.e. 6,000 during project.
INTERNATIONAL INTERNET PRESERVATION CONSORTIUM • Project involving national libraries. • Led by Bibliotheque Nationale de France. • Also includes BL, Library of Congress, Library and Archives of Canada, Nordic countries, Italy, Australia, Internet Archive.
INTERNATIONAL INTERNET PRESERVATION CONSORTIUM contd. • Aims to develop automated web-crawler mechanism. • Open-source tools to search web at regular intervals matching agreed collection development policies. • Working groups in: access tools; content management, deep web, framework, metrics and test-beds, researcher requirements. • Developmental at this stage.
INTERNET ARCHIVE • Collecting and saving sites since 1997. • Wayback machine. • Legal, technical and procurement issues.
SOME CHALLENGES • Defining UK. • Rapid technology change. • Third party rights (not always subject to UK law). • Libel/defamation issues. • Software issues / which platform? • Validity of a snapshot.
SOME CHALLENGES contd. • Formats for archiving. • Metadata standards. • Archiving ‘look and feel’. • Authenticity.