150 likes | 341 Views
Optical Character Recognition Tool. Bijay Dahal {2008/BCT/509} Kabindra Shrestha {2008/BCT/516} Raj Kumar Shrestha {2008/BCT/527}. Objectives. To convert alpha-numeric character from image into normal text form. To get general idea on image processing. Tools/Technology USed. Overview.
E N D
Optical Character Recognition Tool BijayDahal {2008/BCT/509} KabindraShrestha {2008/BCT/516} Raj Kumar Shrestha {2008/BCT/527}
Objectives • To convert alpha-numeric character from image into normal text form. • To get general idea on image processing.
Overview • Taking image as input . • Converts into normal text form. • Recognizes alpha-numeric characters only. • Edit and Save recognized text. Loaded Image Converted Text Editable
System Architecture Get Image Bold Thin Binarization Thinning Line Segment Character Segment Feature Extraction Matrix Matching Save Text
Methodology/Algorithms • Otsu BinarizationAlgorithm • HilditchSkeletonization Algorithm (Thinning)
Algorithms (contd…) • Generic Segmentation
(contd…) • Feature Extraction (zonning) Based on Zones • 5 horizontal and 5 vertical zones =>25 features Based on Upper and Lower profiles • 10 vertical zones => 20 features Based on Left and Right profiles • 10 horizontal zones => 20 features Total Number of features • 25 + 20 + 20 = 65
Schedule OFF DAYS: Exam Time: (25 Days) Dashain Holidays: (15 Days) Tihar Holidays: (3 Days)
Challenges/Problem Faced • Choosing the correct algorithm. • Hard to implement algorithm. • Implemented, but output is not accurate. • accuracy of matrix matching.
Conclusion • Text from image gets converted to text file. • Simplest algorithm; accuracy is about 40%-60%.
Limitation • Can’t recognize text in noisy image. • Can’t detect inclined text from image. • Matrix matching is slow. • Bad thinning & noise makes some text unrecognizable.
FUTURE Enhancement • Scanner image input. • Recognize PDF and other image format. • Nepali / Devnagari font support. • Different fonts. • Output in PDF or Word file format. • Skewing & Noise reduction. • Handwritings. • Neural Network.
References • Bates, K. S. (2010). Head First Java. O'Reilly. • Improving Optical Character Recognition http://www.csc.villanova.edu/~mdamian/csc3990/csrs2008/07-csrs2008-AJPalkovic.PDF • Evaluation of OCR Algorithms for Images: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.9539&rep=rep1&type=PDF • Otsu Thresholding - The Lab Book Pages http://www.labbookpages.co.uk/software/imgProc/otsuThreshold.html • Image Segmentation http://people.cs.uchicago.edu/~pff/segment/ • HilditchAlgorithm http://cis.k.hosei.ac.jp/~wakahara/Hilditch.c • Skeletonizationhttp://cgm.cs.mcgill.ca/~godfried/teaching/projects97/azar/skeleton.html • Java OCR | Ron Cemer'sBlog http://www.roncemer.com/software-development/java-ocr