120 likes | 137 Views
Learn about a comprehensive workflow system for managing large-scale digitization projects, including image capture, editing, OCR, quality control, archiving, and online serving. Explore how the system streamlines processes and enhances image accessibility.
E N D
Image Workflow Processes Elspeth Haston, Robert Cubey, Martin Pullan & David J Harris
Large scale digitisation programmes are becoming more common, resulting in: • Large numbers of files – potentially nearly 3,000,000 for Edinburgh herbarium (E) • High quality images • Large file size – c. 150MB each • Images captured with minimal data records • These images need to be managed and made available and the scale is too large for completely manual processes
Capture Image • Image workflow being developed at RBGE incorporating: • image capture • automated image processing • metadata recording • optical character recognition • quality control • image streaming online • archiving Edit Image Save Image Dropbox OCR Image polling & metadata capture QC Create jpg & zoomify Save tiff & raw Serve Online Archive System User
Capture Image Image captured using digital camera or scanner Image edited in Leaf Capture software and/or Adobe Photoshop Edit Image Capture Image Edit Image Save Image Dropbox Save Image • Images saved into folders • batches consisting of ¼ day’s work are checked for quality prior to being transferred OCR Image polling & metadata capture QC Save tiff & raw Create jpg & zoomify Serve Online Archive
Image polling & metadata capture • A series of dropbox folders are used to facilitate the use of parallel processing • an internal folder structure contains the equipment and operator names which form part of the metadata Dropbox Capture Image Edit Image • The image management system polls the dropbox folders • any new image files are registered in a MySQL data base and the metadata (equipment, operator, date, etc) are recorded Save Image Dropbox OCR Image polling & metadata capture QC Save tiff & raw Create jpg & zoomify Serve Online Archive
Capture Image Edit Image Save Image QC OCR Additional modular components • A copy of the image is processed using ABBYY Optical Character Recognition (OCR) software • the text is recorded in the MySQL database to facilitate searching • a pdf is available to help users carry out additional data entry from the image Dropbox • We are developing a quality control checking process • provides an interface for a user to open images and record a quality assessment • enable correction and appending or overwriting as appropriate OCR Image polling & metadata capture QC Save tiff & raw Create jpg & zoomify Serve Online Archive
Capture Image Edit Image Archive Serve Online Create jpg & zoomify Save tiff & raw Save Image The image management system creates a jpg and a zoomify version of the image files The tiff and the raw files are saved into a zip folder The zoomify files are served online, enabling users to zoom in and examine the specimen in detail The zip folders comprising the tiff and the raw file are then archived onto tape and external hard drives Dropbox OCR Image polling & metadata capture QC Save tiff & raw Create jpg & zoomify Serve Online Archive
Capture Image Edit Image Archive Serve Online Create jpg & zoomify Save tiff & raw Save Image Image polling & metadata capture The location of each file is also recorded in the MySQL database Dropbox OCR Image polling & metadata capture QC Save tiff & raw Create jpg & zoomify Serve Online Archive
The image workflow system at RBGE has now processed over 130,000 images. • modular system has flexibility, but each new module may require access to the archived tiff files and some level of reprocessing may be necessary • it has proved unfeasible to maintain the tiff and raw files on a server • during the development of the workflow backlogs built up which can have a large impact on image management and on the curation of the collections
The workflow is enabling us to manage the images effectively: • the system helps with the integration of digitisation and curation in the herbarium • requests for images and data are easily managed and users will shortly be able to download images and data directly • the modular element will allow us to incorporate a georeferencing tool • the workflow is allowing us to manage several large digitisation projects in an integrated system