60 likes | 176 Views
IIPC GA Curator Tools Fair May 2014 WEB CURATOR TOOL. Nicola Bingham Web Archivist. Introduction Jointly developed by BL and NLZ 2006 under the auspices of the IIPC WCT manages the selective web harvesting process Designed for use in libraries by non-technical users Open source
E N D
IIPC GA Curator Tools FairMay 2014WEB CURATOR TOOL Nicola Bingham Web Archivist
Introduction • Jointly developed by BL and NLZ 2006 under the auspices of the IIPC • WCT manages the selective web harvesting process • Designed for use in libraries by non-technical users • Open source • Uses the Heritrix web crawler
What it does and doesn’t do. • Appraisal and selection: choosing websites for capture. • Subject specialists, curators, external agencies • BL uses a selection permission tool plugged into WCT • Metadata/Description • Basic Dublin Core Metadata • Titles, description, subject and collection tagging • Scoping and Data Capture • Scheduling • Crawl parameters, e.g. path depth, size of download • QA and Analysis • Heritrix log files • Browse tools • Recommendations based on indicators
What it does and doesn’t do continued.. • Storage and Organisation • WARC files created in WCT • Passed out of WCT for indexing and long term storage • Access/Use/Reuse • Waybackis plugged in as the access tool • Harvested sites can be viewed within the tool • Risk Management • Harvest Authorisation module, rights metadata • Records the outcome of publisher communications • Control the display of Targets
Development • Latest version 1.6.1available now. • UI new features and improvements (x 17) including… • Date pickers for date fields • Scheduling heat map • Harvest optimisation • Bug Fixes (x11) • Development related e.g., • No longer need to install Apache Tomcat server or database etc • NLNZ budgeted NZD 50,000 for 2014-15 • Open development process up to all WCT users. • WCT pages http://webcurator.sourceforge.net/ • Wiki http://sourceforge.net/projects/webcurator/ (Code, Support, mailing lists, bug tracker)
Thank-you.UK Web Archive http://www.webarchive.org.ukhttp://britishlibrary.typepad.co.uk/webarchive/ @UKWebArchiveNicola.Bingham@bl.uk