DOCLIB Overview A developer perspective

DOCLIB OverviewA developer perspective Guangyu Zhu and David Doermann Language and Media Processing Lab University of Maryland

Agenda • DOCLIB Background • Inside DOCLIB • Architecture Overview • DOCLIB Development Overview • Development Environment/Processes • DOCLIB Add-on Structure • Coding Conventions • DOCLIB Resources • LAMP DOCLIB Project page • DOCLIB Wiki • DOCLIB online demo • DoclibCommandLine

DOCLIB Objectives To develop a single image processing library that supports typical document processing needs and multiple platforms • Efficient Technology Transfer • Software compatibility • Balance of academia, governemnt, and industry needs • A common framework for document processing • Scalability • Rapid prototyping of newmethods • Simple algorithm comparison • Robustness and Stability • High quality standards • Platform-independence • Accommodating frequently changing requirements

DOCLIB Components

DLImage DLDoc. DOCLIB Architecture DocLib´s two essential pillars DLImage: • Image Processing Functions DLDocument: • Container of Document Metadata for different processing task e.g. • image rotation • image deskewing • image conversions • cc calculation • shape drawing e.g. • XML input/output • page segmentation • text line extraction • logo detection • page layout analysis

DLImage Design

DLImageFactor Design DLTiffImage DLJpegImge DLImageFactory DLBaseImage Registers DLPPMImage : : DLPPImage • Design Factors • Image Type objects are static/singleton objects created on startup • DLImageFactory is a static/singleton object • Image Type objects registers itself with the DLImageFactory during startup • DLImageFactory keeps a list of supported Image objects as each image type calls the register function • Additional image types can be plugged into DOCLIB without modifying existing DOCLIB code

Supported Image Processing Functions 1. Open TIFF, JPEG, GIF, PPM, PNG, BMP, PBM(P1 and P4) 1a. Memory based read TIFF 1b. Memory based read GIF 1c. Memory based read PNG 1d. Memory based read BMP 1e. Memory based read PPM 1f. Memory based read JPEG 1g. Memory based read PBM 1h. Supports multiple page TIFF file 2. Save TIFF, JPEG, GIF, PPM, and PNG images to disk 2a. Memory based write TIFF 2b. Memory based write BMP 3. Calculate connected components 4. Save individual components of an image to disk 5. Resize an image 6. Rotate an image (large image takes long time) 7. Copy an image 8. Extract sub-image 9. Flip image 10. Contour image 11. Reverse image 12. Paste Image 13. Set Pixel of an image 14. Convert images to and from BIT(1 bit, black/white), BYTE(8 bits, gray), COLOR (24 bits) 15. Draw shapes within images a. Line (color) b. box (color) 16. Dilate image 17. Erode image 18. Sharpen image 19. Blur image 20. Mask image 21. Convert to YCrCb 22. YCrCbBinarization 23. 256Color quantized 24. PercentThresholdBinarization percent (i.e. 90 or .9) 25. ThresholdBinarization rThresh gTresh bThresh 26. Color2Gray_global 27. Gray2Binary_global thresh 28. Binary2Color 29. Gray2Color 30. Binary2Gray 31. loadDocFile - Load an image as a document. Flips bits if black pixels are more than white pixels. 32. Histgram for 1 and 8 bit images 33. Projection 34. Skeletonize 35. deskew 36. Loading an unknown image from memory or from file (Unknown image type must be one of the supported images) 37. Skeletonize 38. DLDocument 39. Centroid Calculation

DLDoc. DLDocument Design DLDocument: DLPage 1: DLPage 1: DLPage 2: DLPage N: … DLZone 1: Zone 2: Zone M: Zone 1: ... Zone 1_1: Zone 1_2: Zone 1_3: DLZone class allows users to subclass their own objects: e.g. DLSegment, DLLine, DLCharacter, DLTable etc.

DOCLIB Development Overview • STL C++ code without platform specific dependencies, such as MFC etc • developed under Microsoft´s .net environment (mainly Visual Studio .NET 2003, i.e. v7.1) • compiles and runs under Windows and Linux • accessible via cvs server by core developers • consistent coding and documentation conventions • support add-on applications using core DOCLIB as the foundatation layer

Software Development Processes • Source Control • Bug Tracking • Software Integrity • Documentation Standards • Distribution Portal • Software development documentation under /doc and /ReleaseDoc directories of DOCLIB release

Accept Bug/Feature Implement CVS Update Unit Test Debug N Y External Failure? Bug? Submit Bug Y N Run Regression Test N Pass? Resolve Bug Check into CVS Y Software Development Process

DLImage DLDoc. DOCLIB Add-on Concept Add-on 2 Add-on 1 DOCLIB Core Add-on 3 Add-on 5 Add-on 4

DOCLIB Add-on Structure

DOCLIB Coding Conventions • Function names should start with lowercase“dl” • void dlGetName() • Class names should start with uppercase “DL” • class DLImage { } • Each function will be documented to conform to Doxygen standards

DOCLIB Coding Conventions • Run-time exceptions should be thrown using standards defined in DLException.h • No creation of interim files during processing. All data manipulation in memory

Generate documentation using Doxygen • Load doxygen config file under the/doc directory of the core DOCLIB release and customize for your add-on

DOCLIB Resources • LAMP DOCLIB Project page • http://lampsrv01.umiacs.umd.edu/projdb/project.php?id=55 • Include citation to DOCLIB as appropriate • DOCLIB Wiki (access controlled) • https://wiki.umiacs.umd.edu/lamp/index.php?title=DOCLIB • DOCLIB online demo • http://lampsrv01.umiacs.umd.edu/doclib/upload.php • DOCLIBCommandLine • A utility shipped with core DOCLIB that supports command line, console, and script mode image processing • Its source code is intent to demo the key DOCLIB image processing features

DOCLIB Overview A developer perspective