330 likes | 552 Views
Digitization Workflow Management System for Massive Digitization Projects. The 2 nd International Conference on Universal Digital Library 2006 (ICUDL 2006) Mohamed Yakout Noha Adly Magdy Nagi mohamed.yakout@bibalex.org noha.adly@bibalex.org magdy.nagi@bibalex.org.
E N D
Digitization Workflow Management System for Massive Digitization Projects The 2nd International Conference on Universal Digital Library 2006 (ICUDL 2006) Mohamed Yakout Noha Adly Magdy Nagi mohamed.yakout@bibalex.orgnoha.adly@bibalex.orgmagdy.nagi@bibalex.org Bibliotheca Alexandrina November 19, 2006
Goals • Automate, track and manage the digitization workflow. • Flexibility in defining digitization workflow Phases. • Support dynamic evolution and deviations with a history tracking. • Flexibility integration with the LIS and Library Digital Repository. • Accept external partially digitized Jobs to start in the proper Phase within the digitization workflow • Simultaneous management of multiple projects with a diversity of materials (books, journals, manuscripts, audio, video, slides, … etc)
Related Work • Manual workflow management using several software packages (MS Excel, MS SharePoint, MS Project) • Simple tracking workflow system with limited capabilities • Several integrated digitization activities (digital capturing, image processing, OCRing, …) in one software • DOCWorks from CCS. • BookRestorer from i2s. • OUPS • Limitations: • Tightly coupled with certain tools and do not allow easily other tools to be integrated. • No Resources Management (e.g. Workstations and users) • Lack of projects and collections management. • Manual files handling between the storage server and clients. • Lack of handling workflow exceptions, dynamic evolution and deviations except through manual intervention.
System Data Model The object being digitized • Book for Naguib Mahfouz • Photos for an event • Map for Alexandria • Music sheet for Omar Khayrat
System Data Model All types of materials in the system • Book Manuscripts • Map Journals • Audio Video
System Data Model A task that should be applied within the digitization process • Scanning Processing • OCRing Encoding • Publishing Zipping for archiving
System Data Model The system users with several roles • Digital lab operators • Shift operators • Administrator
System Data Model Represents logical grouping for the Jobs • Nasser • AlexMed • AMEEL
System Data Model The computer used to perform the Phase
System Handlers <Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . . </Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase> </Phase> • XML Phases Definition Handler • Pre-Phase and Post-Phase • Physical section • Database section • Reflection Call
System Handlers <Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . . </Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase> </Phase> • XML Phases Definition Handler • Pre-Phase and Post-Phase • Physical section • Database section • Reflection Call
System Handlers <Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . . </Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase> </Phase> • XML Phases Definition Handler • Pre-Phase and Post-Phase • Physical section • Database section • Reflection Call
System Handlers <Phase Name="Book Arabic OCR"> <PrePhase> <Physical Mode="UnRestricted"> <Folder Name="OTIFF" Create="false" ToDestination="false" NewName="OTIFF" Mode="Restircted"> <File Name="OriginalFiles" Type="tif" Count="+" ToDestination="false" Compare=""/> </Folder> . . </Physical> </PrePhase> <PostPhase> <Physical Mode="UnRestricted"> <Folder Name="TXT" Create="false" ToDestination="true" NewName="TXT" Mode="Restircted"> <File Name="" Type="frf" Count="1" ToDestination="true" Compare=""/> <File Name="" Type="art" Count="1" ToDestination="true" Compare=""/> </Folder> </Physical> <Database> <Field Name="Font" DisplayName="Font Family: " /> <Field Name="LrnPage" DisplayName="Learn Page : "/> . . </Database> <ReflectionCall Method="packageName.doSomething" /> </PostPhase> </Phase> • XML Phases Definition Handler • Pre-Phase and Post-Phase • Physical section • Database section • Reflection Call
System Modules • Check-In • Plug-in based for integration. • Creates the Job in the system • Assign the Job to any Phase • Check-Out • Java Reflection Call section of the XML Phases Definition • Ingest the Job’s digital objects into the repository
System Modules • Phases Manager • Request a new Job • Download the Jobs folders and files • Submit the Job back to the system to continue other Phases • Reject a Job and recommend another Phase in addition to specifying reasons. • Redirect a Job from the default Phase Sequence • Provide information on the files level to help solving problems
System Modules (Contd) • Reporting • Workflow Tracking • Pending Items • Late Jobs • Operators rates • Build Customized Report • Archiving • On different Medias with different size and on online storage • Administration
Quality Assurance • Supported on two different stages • Maintain QA information on the files levels while moving from a Phase to another. • A QA Phase is defined in the Digitization Phase Sequence as the last Phase before the Archiving
Achieving Flexibility Using DWMS • The defined Phase Sequence for a Job Type is a guide, rather than a prescription. • The list of Phases can or can not be in the Phase Sequence. The operator can assign the Job to any of all of these Phases. • Jobs can be Forwarded dynamically to another Phase in the Phase Sequence. • Changes in the Phase Sequence affects the current and new Jobs in the system, leading to natural process evolution
Future Work • Check-out plug-in for Fedora.. • Check-in plug-ins will be implemented to support various metadata standards formats MODS, DC, VAR, etc. • Enhance the software interface with graphical tools to help design and follow the digitization process.
Thank You mohamed.yakout@bibalex.org