240 likes | 338 Views
How to Face the Challenges of Web Archiving?. The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland. Context: National Library of Ireland. Beginnings : Established by the Dublin Science and Museum Act, 1877
E N D
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland LIBER 2012 - 1
Context: National Library of Ireland • Beginnings: Established by the Dublin Science and Museum Act, 1877 • Mission: “to collect, preserve, promote and make accessible the documentary and intellectual record of the life of Ireland”. • The Digital Record: Born Digital Programme established in 2010, covering web archiving. • Web Archive Projects: 2 pilot projects in 2011 LIBER 2012 - 2
Context: Internet Memory • European Archive / Internet Memory Foundation • Established in 2004 in Amsterdam(offices also in Paris) • Mission: to preserve Web content as a new media for current and future generations • Actions: Sensibilization, partnerships, R&D • Open Access Collections: UK National Archives & Parliament,PRONI, CERN and The National Library of Ireland • Internet Memory Research • Spin-off of IM established in June 2011 in Paris • Missions: to operate large scale or selective crawls & develop new technologies (crawl, access, processing and extraction) LIBER 2012 - 3
Web Archiving Project: Project Origins National Library of Ireland • Building a 21st Century Library: • Born Digital • Digitisation • Single Integrated Catalogue • Digital Repository • OSCAIL, the Digital Library Programme LIBER 2012 - 4
Web Archiving Project: Project Origins National Library of Ireland • Born Digital Materials: • Natural progression for NLI’s strong political, cultural and historical collections • How best to approach this in time of unprecedented financial difficulty? • Born Digital Programme established to examine requirements and produce a policy document for the next steps LIBER 2012 - 5
Web Archiving Project: Project Origins National Library of Ireland • The Hand of History: • Snap General Election • Five Weeks LIBER 2012 - 6
Web Archiving Project: Project Origins National Library of Ireland • Just do it LIBER 2012 - 7
Web Archiving Project: Project Origins National Library of Ireland • Just do it • How? LIBER 2012 - 8
Web Archiving Project: Project Origins National Library of Ireland • Collaborative Partnership: • Partner that suited our requirements and that had experience with others in the cultural sector • Requirements: • Technical skills in the NLI but working on other projects – needed these skills • Leverage NLI’s on strong curatorial experience, esp. in politics • Fast! LIBER 2012 - 9
Web Archiving Project: Project OriginsNational Library of Ireland • Project phases: • Project scoping and contract • Site selection • Permissions gathering • QA (look and feel) • Publication and promotion LIBER 2012 - 10
Site Selection and PermissionsNational Library of Ireland • Selection Criteria: • Website presence • Technical reasons • Cut-off date • Women candidates • Permissions: • All sites contacted and provided with a brief • Pressurised but necessary phase LIBER 2012 - 11
Scope of projectsNational Library of Ireland • General Election: • Crawl: 200 snapshots • Scope: 100 seeds • Frequency: 2 times • Date: Feb. 2011 • Presidential Election: • Crawl: 80 snapshots • Scope: 70 seeds • Frequency: 3 times • Date: Oct-Nov. 2011 LIBER 2012 - 12
CrawlInternet Memory • Seeds Validation: • URLs, Duplication, Redirection, External links, Dynamic websites • Scope Parameters: • Domain, host and path ; Social Web content ; Frequency ; Robots.txt files exclusion ; Politeness • Specific incidents technical changes on the fly • Modification of scope ; Pending crawls ; Adaptation of the politeness • Improvement of second crawl LIBER 2012 - 13
Quality Assurance (QA)National Library of Ireland • Manual QA • Jira software • IM – Technical QA • NLI - ‘Look and Feel’ QA • Multiple browsers • Communication with site owners (building relationships and promotion) LIBER 2012 - 14
Quality Assurance (QA)Internet Memory • Why? • How? • Manual and visual method: homepage + 2 • Resolution of issues • Temporal Coherence LIBER 2012 - 15
AccessNational Library of Ireland • Available to the public • Full text search • IM website – search by keyword, URL • NLI catalogue – keyword via widget developed by NLI IS team and IM • Future – access through NLI’s own interfaces, issue of integrating results LIBER 2012 - 16
Publication and PromotionNational Library of Ireland • NLI social media initiative (Twitter and blog) • Project participants • Print media (esp. in area of technology) • And IM! • Usage figures have increased but real value more apparent in 5-10 years LIBER 2012 - 17
Usage Statistics of Web ArchiveNational Library of Ireland • 21/09/2011: Official launch of NLI Web archives (Tweets) • 26/10/2011: Blog post on nli.ie/blog and Paper in thejournal.ie • 25/11/2011: Paper on irishtimes.com • 20/01/2012: Paper on irishtimes.com • 17/03/2012: Post on soundofthearchives.wordpress.com • 04/05/2012: Paper on irisheconomy.ie LIBER 2012 - 18
Advantages of Web ArchivingNational Library of Ireland • Web archiving: • New opportunities for delivery of materials to users • Work with existing users expectations that content be online • Reach new audiences LIBER 2012 - 19
Advantages of Web ArchivingNational Library of Ireland • Political web archives;Irish General Election: • Researchers can compare online content pre- and post-election • Facilitates research into how ‘online’ this election was • Assess impact of technological developments in campaign communications • Record of campaign information LIBER 2012 - 20
Benefits of Working TogetherNational Library of Ireland • Pilot project for a long-term activity: • Allowed us to enter a new collecting area despite lack of tech expertise • Facilitated collection of important material that one else was collecting • Collect material quickly • Leverage curatorial skills • Gained new technical skills LIBER 2012 - 21
Benefits of Working TogetherInternet Memory • To supporte the development of Web archiving initiatives • To operaterapiddeployment of Web archives • To address new challenges in thisarea: • Social media content • QA • Automatization LIBER 2012 - 22
Conclusion • General Election: • 18,495,771 URLs • 1.14 TB • 10,405 ARCs • Presidential Election: • 7,333,399 URLs • 278.10 GB • 2,513 ARCs • View the NLI collections at: • http://www.nli.ie/en/udlist/digital-collections.aspx • View the Web archive blog entry at: • http://www.nli.ie/blog/index.php/2011/10/26/general-election-2011-web-archiving/ • View Internet Memory Collections at: • http://collections.europarchive.org/ To be continued… LIBER 2012 - 23
Questions? Thanks for your attention! Catherine Ryan National Library of Ireland http://www.nli.ie cryan@nli.ie @NLIreland Chloe Martin Internet Memoryhttp://internetmemory.org chloe@internetmemory.net @InternetMemory LIBER 2012 - 24