100 likes | 195 Views
Background removal in degraded documents. Chen Yan & Graham Leedham School of Computer Engineering Nanyang Technological University. For Each Region. Feature Vector Extraction. Decompose Image Into 4 Regions. Divide Region Into 4 Sub-regions. Region Classification.
E N D
Background removal in degraded documents Chen Yan & Graham Leedham School of Computer Engineering Nanyang Technological University
For Each Region Feature Vector Extraction Decompose Image Into 4 Regions Divide Region Into 4 Sub-regions Region Classification Any appropriate threshold method for thresholding the region? Image (256-level Greyscale) Thresholding Binary Image Decompose Threshold Algorithm
Double-Side Noise Image Dark Background Ghosting Noise 321 2 1 323 31 Bright Background 33 34 3 4 2 2 1 1 31 32 4 4 33 34 322 324
Feature Extraction & Region Classification • Thick Strokes • High Variation • Lots of Noise • Faint Strokes • Medium Variation • Some Noise • No Strokes • Low Variation • Small Noise • Word Direction Based Edge Strength (WDES) • Word Direction Based Variance (WDVAR) • Mean-Gradient Value (MGV)
Applying Threshold Methods • Different threshold methods are applied for three classes of sub-images. • Heavy Stroke Class • Heavy strokes only: Bernsen’s Method • Heavy and Faint strokes: Improved Niblack’s Method • Faint Stroke Class • Noise Removal and Enhancement • Yanowitz & Bruckstein’s Method
Bernsen’s Method Eikvil/Taxt/Moen’s Method Improved Niblack’s Method Proposed Decompose Threshold Method Original Image Otsu’s Algorithm Yanowitz/Bruckstein’s Algorithm QIR Algorithm Experiment Result 1
Eikvil/Taxt/Moen’s Method Bernsen’s Method Strong Noise Removed Original Image Yanowitz/Bruckstein’s Algorithm Improved Niblack’s Method Proposed Decompose Threshold Method Retain Stroke Details Otsu’s Algorithm QIR Algorithm Experiment Result 2
Conclusion • None of existing method is able to produce good results consistently on a wide range of degraded historical documents which contains different characteristics in different area. • The proposed approach is a local adaptive analysis method, which uses local feature vectors to find the best approach for thresholding a local area. • The future application of this technique can contribute to other difficult document images, such as cheques and newspaper images.