510 likes | 519 Views
This article discusses the significant milestones achieved in digitization, with a focus on Arabic content, at the BA Digital Lab. The Digital Laboratory is the cornerstone for premium quality digitization and is well-equipped for various types of media. The lab plays a crucial role in digitizing Bibliotheca Alexandrina's valuable collections. With a workforce of 120 staff members distributed over several teams, the lab operates 7 days a week in 2 shifts per day, working on multiple collections simultaneously. A workflow management system is essential to control and track the digitization process, and the Digital Assets Factory (DAF) serves as a digitization workflow management system. The Digital Assets Repository (DAR) is developed to maintain the institution's digital collections, providing public access to the digitized materials. The article also highlights capacity building efforts, such as workshops and collaborations with external organizations, to share the BA's technical expertise in digitization.
E N D
Capacity BuildingPassing on the Experience World Digital Library Arab Peninsula Regional Group meeting Dr. Noha Adly
Reaching significant milestones in digitization With a focus on Arabic content
The Digital Laboratory The Corner-stone For Premium Quality Digitization
Digital Laboratory Digitizing various media including slides in multi-formats, negatives, books, manuscripts, pictures and maps Digitizing Bibliotheca Alexandrina’s valuable collections Many of the Library’s projects are highly dependant on the digital laboratory
Digital Lab Man Power Workflow & Workflow Management system are essential to control and track the process 120 staff members Distributed over several teams Working 7 days / week 2 shifts / day Working in many collections simultaneously
What is a Workflow ? • A workflow is a well defined sequence of operations, declared as work of a [resource]* during which documents, information or tasks are passed from one resource to another for action • According to a defined procedural rules • Having an estimated time • Can be documented • Can be learned *Resource: is a person, simple or complex mechanism, group of persons, an organization of staff, or machines
Digitization Phase “Scanning” Processing Phase OCR Phase Hardcopy is converted into raw digital image • Raw digital image is enhanced to realize: • Better image quality • Better OCR accuracy It extracts the text corresponding to the processed image contents Basic Digitization Workflow
Basic Digitization Workflow For each phase, we need to: • Define the specs of the output (Quality) • Set the procedure of work to guarantee quality • Calculate the required time • Whenever possible try to Automate tasks • Set Benchmarks to monitor the progress
Why Workflow Management System? • Automation of task handling • Progress tracking • Process Management • Flexibility
Digital Assets Factory DAF(DAF is the digitization workflow management system) 1. Automation of task handling
2.Progress tracking Workflow Tracking Pending Items Late Jobs Employee’s Rates Build Customized Report Digital Assets Factory DAF(DAF is the digitization workflow management system)
3. Process Management Roles (Permissions) Job Types General Settings Phases Employee accounts Workstations Collections Digital Assets Factory DAF(DAF is the digitization workflow management system)
4. Flexibility Digital Assets Factory DAF(DAF is the digitization workflow management system)
Targeted MonthlyProductionRate ≈ 5,000 books/month (1,800,000 pages) HOW to reach the target?
Daily Rates (single shift) • Scanning: ≈ 3,000 pages/person • Processing: ≈ 3,000 pages/person • Latin OCR: ≈ 4,000 pages/person • Arabic OCR: ≈ 2,100 pages/person
Monitoring • Rate/user (monitored during the shift) • User rate & Rate/shift report
Reporting • Weekly production • Monthly production
BA’s digital collections are maintained within the institution’s Digital Assets Repository - DAR
Digital Assets Repository Developed to facilitate the creation, use and management of the digital library collections. A repository for all types of digital material including slides in multi formats, negatives, books, manuscripts, pictures and maps, audio and video, thus preserving and archiving the digital media Provides public access to digitized collections through a web-based search and browsing facilities
Digital Assets Repository • DAR’s core consists of 4 fundamental modules: • The Digital Assets Factory (DAF) ) http://wiki.bibalex.org/DAFWiki • Responsible for the complete automation of the digitization cycle • It was developed using open source tools • The Digital Assets Metadata (DAM) • Keeps a unique and intact version of the digital assets’ metadata • Helps ensuring that cataloging, indexing, browsing, searching and retrieval are done efficiently • In the latest version, DAM uses Fedora to manage the metadata. • Based on METS/MODS standards • The Digital Assets Keeper (DAK) • A repository for the digital assets that are either produced by DAF or are directly introduced into the repository. • Digital Assets Publishers (DAP) • Components that publish and display the digital assets stored in DAK • Book viewers • Search engines
Imparting Capacity Building Sharing the BA’s technical expertise with external organizations
ISIS has conducted capacity building workshops: Yale University December 2007 Arabic and Middle Eastern Electronic Library Municipal Administration Modernization (MAM) program in Syria March 2009 Kuwait Institute for Science and Research “KISR” January 2010
Capacity Building Scope Passing on the experience of building an institutional repository to maintain the production of high quality digital assets in terms of digitizing, processing, OCRing, encoding, archiving and publishing based on well known standards.
The capacity building program Overviewing BA/ICT facilities (Digital Library, Internet Archive, VISTA, HPC, System infrastructure design, etc.)
The capacity building program • General tour over viewing BA/ICT facilities • Digitization process • Digital image parameters • Compression formats • Digitization workflow and phases
The capacity building program • General tour over viewing BA/ICT facilities • Digitization process • Hands on Scanning and Image processing • Enhancing image and text quality • Images rendering a good OCR
The capacity building program • General tour over viewing BA/ICT facilities • Digitization process • Hands on Scanning and Image processing • Quality Assurance
The capacity building program • General tour over viewing BA/ICT facilities • Digitization process • Hands on Scanning and Image processing • Quality Assurance • Digital Assets Factory (DAF) • Automation of the digitization workflow • DAF key features • Job life cycle
The capacity building program • General tour over viewing BA/ICT facilities • Digitization process • Hands on Scanning and Image processing • Quality Assurance • Digital Assets Factory (DAF) • OCR • Analysis of the input and classifying it to different fonts • Automating OCR procedure
The capacity building program • General tour over viewing BA/ICT facilities • Digitization process • Hands on Scanning and Image processing • Quality Assurance • Digital Assets Factory (DAF) • OCR • Online Storage
The capacity building program • General tour over viewing BA/ICT facilities • Digitization process • Hands on Scanning and Image processing • Quality Assurance • Digital Assets Factory (DAF) • OCR • Online Storage • Library Services • VTLS including its different modules • LIS servers and DB maintenance • OPAC and WEBAC customization • In-house developed systems
The capacity building program • General tour over viewing BA/ICT facilities • Digitization process • Hands on Scanning and Image processing • Quality Assurance • Digital Assets Factory (DAF) • OCR • Online Storage • Library Services • Multimedia delivery framework