Intelligent Document Processing - what’s so intelligent about it

Intelligent Document Processing - What’s so Intelligent about it ? IDP is a word that's gaining ground in enterprises due to its promise of a Nx boost in productivity and improvement in overall accuracy resulting in significant savings in terms of both time and resources. But how exactly does one define IDP ? Is it closely related to automation and its cool cousin RPA ? Does it aid or replace human resources ? How does it enhance productivity ? Let’s answer all of the above by beginning with plain vanilla document processing. If you were to ask yourself about parts of your business that deal with documents you would be hard pressed to identify areas that DO NOT deal with them! But there are verticals in every enterprise that deal with a much larger volume and that is where tools that help improve processing speed and accuracy are sought after. IDP pipeline Once the functional pipeline has been identified and frozen, we begin with our IDP pipeline. The pipeline comprises classifiers, pre and post processors, deep learning models to identify text, extract text (handwritten and machine generated), rules engines (to apply business rules on the extracted information). Pre-processing The first step , which is in fact a series of models, deals with image pre-processing and de-noising. Noise comes in many forms - background, logos, watermarks, shadows etc and in order for the OCR to do a thorough job. Based on the use case we build specific models to deal with certain types of noise but, on a broad level all of these models sit in our library so that they can be recalled with zero changes to handle similar tasks for our customers. Given below is a typical example of how we convert a hard-to-process image into something that our engines can easily handle:

As mentioned earlier, a lot of our models can be reused across different verticals allowing us to define a robust and scalable preprocessing pipeline that can be schematically depicted like below and can be fine tuned for narrower use cases based on the data. Let's walk through the basics of how we ensure every step adds to that final accuracy that our customers desire. We have different models finely tuned to the task they have been trained for. Hence it's crucial for us to identify the type of content present in the document. Though this is the first step in our pipeline, in a few cases we intervene in the process even before the image data is generated! This can be done via multiple methods but the idea is to control the source quality so that as much noise as possible is removed from the process which results in improved image quality and hence better outputs

User/pipeline input undergoes basic sanity checks and is fed to the classifier that decides which model needs to take up the rest of the process Once the document is sent to the appropriate model , it will undergo suitable pre-processing and enhancements so that the object detection model can detect all relevant areas of interest. Once all these areas have been identified, we put them through a few more steps to ensure the right data is extracted. For example in a receipt, we would ideally have no use for the logo of the company or its tagline. We ignore such data points since at times, they can interfere with data extraction. Auto Templatization Of course, just a bunch of text is of no use to anybody and the last model in our pipeline uses triangulation, neighbourhood information and other advanced features to pinpoint the category of the text being extracted. For example in an invoice, typically, all items listed will have separate headers like "product code/type", "unit price", "quantity" etc which appear close to each other but might not be named uniformly across formats.

Intelligent Document Processing - what’s so intelligent about it

Intelligent Document Processing - what’s so intelligent about it

Presentation Transcript