90 likes | 280 Views
1. Problem. Many archived two-sided manuscript documents suffer from bleed-through; Bleed-through can be effectively removed offline using image-processing algorithms; A remotely located researcher may want to access both original and corrected versions of a document;
E N D
1. Problem • Many archived two-sided manuscript documents suffer from bleed-through; • Bleed-through can be effectively removed offline using image-processing algorithms; • A remotely located researcher may want to access both original and corrected versions of a document; • We want to avoid sending the document twice, since both versions are very similar. Recto Verso
3. Algorithm Details Registration • We assume that the continuous recto and verso image coordinate frames are related by a six-parameter affine transformation • We search for a parameter vector that gives the best match between the recto and the transformed flipped verso, in the least-squares sense • We identify the registered verso image
4. Joint Compression • Based on existing standards • Original, uncorrected image compressed with standard efficient compression scheme such as JPEG or JPEG 2000 • Segmentation map compressed using efficient bilevel compression scheme, such as JBIG or JBIG2 • Additional information for inpainting transmitted as side information + + 4.6Mbit 131 kbit
2. Bleed-through Removal Model • We assume the existence of underlying recto and verso images without bleed-though. These consist of the background, with the writing, superimposed. • These ideal recto and verso images are combined in some way to produce the observed recto and verso images corrupted with bleed-through (see above). • In general, the scanned recto and verso images (with bleed-through) will not be aligned. Recto and flipped verso images superimposed
Segmentation • We segment each side of the document into the four regions R1-R4. However, it is most important to correctly identify region R2, ‘bleed-through only’. If we miss some parts of R2, bleed-through will remain. If the label R2 is incorrectly assigned to some parts of R1, ‘foreground only’ or R4 ‘foreground and bleed-through’, then parts of the desired writing will be erased. • We first identify points that can be considered to definitely be background (R3), because they are lighter than a certain threshold. • We then identify points that can be considered to foreground (R1), because they are darker than corresponding points on the other side. • Of the remaining points, those whose correlation between the two sides exceeds a correlation threshold are deemed to be bleedthrough (R2). The rest are assigned to R4.
Original with bleed-through With bleed-through removal
Algorithm • Registration: Alignment of recto and flipped verso • Segmentation: Four regions • R1: Foreground only • R2: Bleed-through only • R3: Background • R4: Foreground and bleedthroughoverlap • Inpainting: Region R2 filled in with estimate of background Recto and flipped verso images, superimposed after registration Illustration of four types of regions Inpainting applied to circled region
Inpainting • Points labelled R2 ‘bleed-through’ are replaced by suitable nearby points from the background region R3. In the initial work, a fixed value was used.
5. Conclusion • Bleed-through can be effectively removed by jointly processing recto and verso sides of document. • More complex bleed-through removal algorithms can be used at the server side, with the result transmitted to the remote user. • It is not necessary to separately transmit original and corrected versions to a user who wishes to see both. • All elements can be incorporated into JPEG2000. • More work needs to be done on the segmentation and inpainting aspects of the algorithm.