370 likes | 488 Views
Preparing the 1911 census for digitisation. Dr Anna Bülow, Head of Preservation 1 October 2011, Celebrating the Census. 1911 Census (RG 14). Census of England and Wales of 2 April 1911 34,998 volumes Arranged according to geographical district Approximately 8 million schedules
E N D
Preparing the 1911 census for digitisation Dr Anna Bülow, Head of Preservation 1 October 2011, Celebrating the Census
1911 Census (RG 14) • Census of England and Wales of 2 April 1911 • 34,998 volumes • Arranged according to geographical district • Approximately 8 million schedules • 530 x 315 mm (bigger than A3) • Written on both sides – official address on one side, details of people at that address on the other side • Enumerator’s Summary Books (RG 78) (2,015 pieces)
Background: how to go about scanning the1911 census? • Single supplier contract • consortium possible, but 1 lead supplier • 2 contracts • 1 supplier to scan, 1 supplier for online development and maintenance • Total in-house development • TNA to develop and manage all contracts for scanning, online service and support • Service management contract • TNA to let contract for development and management
Contracting a supplier • OJEU notice (Official Journal of the European Community) • Tender throughout Europe • Conditions of performance (amongst others) • scanning must cause absolute minimum of damage • records must be kept safe and secure at all times • Competitive dialogue • Contract awarded to ScotlandOnline, later BrightSolid • scanning subcontracted • transcription subcontracted
Condition - appearance • Extremely consistent • Standard volume: • 4 holes along the edge of the spine • schedules held in place through 2 long green tags, with 2 bows on top • soft linen spine • hard cover • belt riveted to cover
Condition - damage • 1911 census was accessioned in 1966 • Closed volumes were stored off-site • Not boxed • Water damage and subsequent mould growth • Unclear when damage occurred • Damage had to be dealt with to • ensure optimal image quality • minimise risk during handling • prevent health & safety risks
Surveying • 4 staff surveyed between 19-23 July 2004 • Statistical sample: confidence level of 95% • 403 volumes (every 87th volume) • Focus on ease of scanning, image quality, and Health & Safety (mould)
Survey results • Damage distributed throughout the entire series • Typical damage: tears, folds, curled edges • 7% mould damage • < 2% severe damage (521 volumes) • 2 volumes missing
Labels • Original labels falling off the spines • Re-label all 34,998 volumes • Identify badly damaged volumes for preparation through Collection Care
Damage – folds • Corners • Across the schedules • Obscure information • To be dealt with by scanning team
Damage – minor tears • Usually along the outer edges • Where tears were smaller than 5 cm, scanning team would deal with them
Damage – major tears • Minimise risk of schedules ripping apart during scanning • Where schedules were in more than one piece, carried out through Collection Care • Where schedule was still together, it was put in polyester envelopes by scanning operator
Damage – crumpled edges • Sleeved by scanning team unless heavily damaged
Damage – mould • Presents health risks • Trained scanning team to recognise and report • Always cleaned through Collection Care within fume cabinet
Damage – stuck pages • Due to previous water damage • In a few cases whole volumes stuck together • No option of not separating schedules • Most time consuming work in terms of preparation
Damage – ‘castor oil goo’ • 2 volumes with black ‘goo’ • All pages stuck together • Pages separated and sleeved • Sleeves remained after scanning
Other issues – metal fastenings • Metal fastenings getting rusty • Difficult to remove as corroded metal would break • Taken out in order to separate sheets
Other issues – inserts • Some loose inserts within volumes • Some fastened inserts: adhered, pinned, tagged,… • Different size from schedules • Ensure correct association and sequence
Other issues – institutional booklets • Bound like schedules • Booklets meant that sheets became double the size • Spines were cut
Other issues – belts • Belts had sharp buckles • Complete removal considered • Schedules were not retagged and bound • Held together by cotton tapes
Other issues – binding • 2 options: • re-tag and bind • cotton tapes and box • Horizontal storage after digitisation
Dealing with damage – pilot studies • 2 pilot studies • First study involved 7 volumes resulting in inconclusive figures • Second pilot study • involved 3 conservators • for 20 weeks • from November 2005 • Just over 200 volumes were prepared during that time resulting in satisfactory figures on • total time estimates • cost estimates • space requirements
Dealing with damage • Focus on • cost • speed • image quality
Image quality • Balance between: image quality / speed of capture / speed of downloading • 24 bit colour uncompressed TIFF, 300 dpi
Scanning equipment • AGFA S655 with modifications to accommodate historic documents • semi-automated sheet feed • straight path rather than drum • tray at back of scanner to collect scanned schedules
Space requirement • Six times as much space as the size of the document to accommodate • document: un-scanned material, scanned material, box • equipment: computer, scanner
Scanning operation • Within one of the Kew repositories • secure • fast in terms of production • easy to monitor • Some shelving was removed to accommodate the operation • Space adjusted to accommodate IT requirements (sockets, cables, etc.)
Scanning operation • Scanning was sub-contracted to third party • 5 scanning stations for schedules • 1 scanning station for book covers • Space for pre-preparation • Space for post-preparation • Scanning took place 12 hours a day (Monday – Friday) • 2 shifts
Scanning order • How long does it take to prepare? • How is the damage distributed? • Most damaged volumes took between 1 and 4 hours, averaging at around 2 hours per volume
Scanning order • Scanning as stored • starts with London, Surrey, Kent,… • London was the most badly damaged • Scanning according to population size • starts with Lancashire, London, Yorkshire,… • best for phased release • Scanning according to ease • starts with Nottinghamshire, Gloucestershire, Worcestershire,… • maximise available preparation time • Final decision • scan in order
Scanning speed • Target rate of 40,000 images per day • ca. 1,000 images an hour per scanner • Scanners allowed for scanning both recto and verso simultaneously • Book covers were scanned separately
Working with scanning company • Census was scanned through Advanced Data Services (ADS) • Working together before scanning to agree on • TNA security requirements (closed documents) • scanning equipment • lay-out of work space • workflow • scanning speed • preparation of volumes before and • after scanning • Working together during scanning • document handling training • flagging up of problem documents
Timeline July 2004 survey of 1911 census November 2005 preparation for scanning started June 2007 preparation for scanning finished (20 months) July 2007 scanning started April 2009 scanning finished (22 months) 13 January 2009 online service launched with majority of English counties March – April 2009 further English counties added June 2009 Welsh counties added 18 June 2009 launch complete 3 January 2012 full, un-redacted release
Final statistics Total number of volumes prepared 2,136 (6.1%) of which 1,108 had damage codes Total number of pages cleaned, separated, flattened, repaired 53,128 Total number of schedules sleeved 14,282 Time taken for preparation 20 months of which 5 months for pilot Time spent on preparation through Collection Care 255 days Time spent on preparation through agency staff 231 days Total number of images 18 million Total number of people involved > 350 of which 280 transcribed the census
Acknowledgements …too many individuals to list, but in particular our commercial partners: BrightSolid (www.brightsolid.com) Advanced Data Services (www.ads.uk.com) Data Capture (www.datacapture.com)