1 / 24

How to Face the Challenges of Web Archiving?

How to Face the Challenges of Web Archiving?. The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland. Context: National Library of Ireland. Beginnings : Established by the Dublin Science and Museum Act, 1877

Download Presentation

How to Face the Challenges of Web Archiving?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland LIBER 2012 - 1

  2. Context: National Library of Ireland • Beginnings: Established by the Dublin Science and Museum Act, 1877 • Mission: “to collect, preserve, promote and make accessible the documentary and intellectual record of the life of Ireland”. • The Digital Record: Born Digital Programme established in 2010, covering web archiving. • Web Archive Projects: 2 pilot projects in 2011 LIBER 2012 - 2

  3. Context: Internet Memory • European Archive / Internet Memory Foundation • Established in 2004 in Amsterdam(offices also in Paris) • Mission: to preserve Web content as a new media for current and future generations • Actions: Sensibilization, partnerships, R&D • Open Access Collections: UK National Archives & Parliament,PRONI, CERN and The National Library of Ireland • Internet Memory Research • Spin-off of IM established in June 2011 in Paris • Missions: to operate large scale or selective crawls & develop new technologies (crawl, access, processing and extraction) LIBER 2012 - 3

  4. Web Archiving Project: Project Origins National Library of Ireland • Building a 21st Century Library: • Born Digital • Digitisation • Single Integrated Catalogue • Digital Repository • OSCAIL, the Digital Library Programme LIBER 2012 - 4

  5. Web Archiving Project: Project Origins National Library of Ireland • Born Digital Materials: • Natural progression for NLI’s strong political, cultural and historical collections • How best to approach this in time of unprecedented financial difficulty? • Born Digital Programme established to examine requirements and produce a policy document for the next steps LIBER 2012 - 5

  6. Web Archiving Project: Project Origins National Library of Ireland • The Hand of History: • Snap General Election • Five Weeks LIBER 2012 - 6

  7. Web Archiving Project: Project Origins National Library of Ireland • Just do it LIBER 2012 - 7

  8. Web Archiving Project: Project Origins National Library of Ireland • Just do it • How? LIBER 2012 - 8

  9. Web Archiving Project: Project Origins National Library of Ireland • Collaborative Partnership: • Partner that suited our requirements and that had experience with others in the cultural sector • Requirements: • Technical skills in the NLI but working on other projects – needed these skills • Leverage NLI’s on strong curatorial experience, esp. in politics • Fast! LIBER 2012 - 9

  10. Web Archiving Project: Project OriginsNational Library of Ireland • Project phases: • Project scoping and contract • Site selection • Permissions gathering • QA (look and feel) • Publication and promotion LIBER 2012 - 10

  11. Site Selection and PermissionsNational Library of Ireland • Selection Criteria: • Website presence • Technical reasons • Cut-off date • Women candidates • Permissions: • All sites contacted and provided with a brief • Pressurised but necessary phase LIBER 2012 - 11

  12. Scope of projectsNational Library of Ireland • General Election: • Crawl: 200 snapshots • Scope: 100 seeds • Frequency: 2 times • Date: Feb. 2011 • Presidential Election: • Crawl: 80 snapshots • Scope: 70 seeds • Frequency: 3 times • Date: Oct-Nov. 2011 LIBER 2012 - 12

  13. CrawlInternet Memory • Seeds Validation: • URLs, Duplication, Redirection, External links, Dynamic websites • Scope Parameters: • Domain, host and path ; Social Web content ; Frequency ; Robots.txt files exclusion ; Politeness • Specific incidents  technical changes on the fly • Modification of scope ; Pending crawls ; Adaptation of the politeness • Improvement of second crawl LIBER 2012 - 13

  14. Quality Assurance (QA)National Library of Ireland • Manual QA • Jira software • IM – Technical QA • NLI - ‘Look and Feel’ QA • Multiple browsers • Communication with site owners (building relationships and promotion) LIBER 2012 - 14

  15. Quality Assurance (QA)Internet Memory • Why? • How? • Manual and visual method: homepage + 2 • Resolution of issues • Temporal Coherence LIBER 2012 - 15

  16. AccessNational Library of Ireland • Available to the public • Full text search • IM website – search by keyword, URL • NLI catalogue – keyword via widget developed by NLI IS team and IM • Future – access through NLI’s own interfaces, issue of integrating results LIBER 2012 - 16

  17. Publication and PromotionNational Library of Ireland • NLI social media initiative (Twitter and blog) • Project participants • Print media (esp. in area of technology) • And IM! • Usage figures have increased but real value more apparent in 5-10 years LIBER 2012 - 17

  18. Usage Statistics of Web ArchiveNational Library of Ireland • 21/09/2011: Official launch of NLI Web archives (Tweets) • 26/10/2011: Blog post on nli.ie/blog and Paper in thejournal.ie • 25/11/2011: Paper on irishtimes.com • 20/01/2012: Paper on irishtimes.com • 17/03/2012: Post on soundofthearchives.wordpress.com • 04/05/2012: Paper on irisheconomy.ie LIBER 2012 - 18

  19. Advantages of Web ArchivingNational Library of Ireland • Web archiving: • New opportunities for delivery of materials to users • Work with existing users expectations that content be online • Reach new audiences LIBER 2012 - 19

  20. Advantages of Web ArchivingNational Library of Ireland • Political web archives;Irish General Election: • Researchers can compare online content pre- and post-election • Facilitates research into how ‘online’ this election was • Assess impact of technological developments in campaign communications • Record of campaign information LIBER 2012 - 20

  21. Benefits of Working TogetherNational Library of Ireland • Pilot project for a long-term activity: • Allowed us to enter a new collecting area despite lack of tech expertise • Facilitated collection of important material that one else was collecting • Collect material quickly • Leverage curatorial skills • Gained new technical skills LIBER 2012 - 21

  22. Benefits of Working TogetherInternet Memory • To supporte the development of Web archiving initiatives • To operaterapiddeployment of Web archives • To address new challenges in thisarea: • Social media content • QA • Automatization LIBER 2012 - 22

  23. Conclusion • General Election: • 18,495,771 URLs • 1.14 TB • 10,405 ARCs • Presidential Election: • 7,333,399 URLs • 278.10 GB • 2,513 ARCs • View the NLI collections at: • http://www.nli.ie/en/udlist/digital-collections.aspx • View the Web archive blog entry at: • http://www.nli.ie/blog/index.php/2011/10/26/general-election-2011-web-archiving/ • View Internet Memory Collections at: • http://collections.europarchive.org/ To be continued… LIBER 2012 - 23

  24. Questions? Thanks for your attention! Catherine Ryan National Library of Ireland http://www.nli.ie cryan@nli.ie @NLIreland Chloe Martin Internet Memoryhttp://internetmemory.org chloe@internetmemory.net @InternetMemory LIBER 2012 - 24

More Related