120 likes | 244 Views
Development of an OCR System. Nathan Harmata TJHSST Computer Systems Lab 2007-2008. What is OCR?. Optical Character Recognition. Font and handwriting based. Goals of My Project. Generic recognition for Latin-based fonts. System built from scratch. Proper handling of most formatting.
E N D
Development of an OCR System Nathan Harmata TJHSST Computer Systems Lab 2007-2008
What is OCR? Optical Character Recognition Font and handwriting based
Goals of My Project Generic recognition for Latin-based fonts System built from scratch Proper handling of most formatting
Transformations Attribute Character Model
Transformations Sector Vector - image is parsed into parts that pass the vertical line test - then each part is transformed into a collection of line segments Gap Vector - gaps, if any, are found on the four sides of the image
Transformations Pixel Concentration Vector – which sides, if any, have a higher concentration of pixels
Character Recognition GCDD – Generic Character Definition Database Averages of Character Models for every character from many different fonts 0 PixelConcentrationVector balanced balanced SectorVector 4 3 GapVector
Character Recognition For a single character: For words, dictionary and grammar references are used.
Results -Mediocre word recognition -Doesn’t handle formatting well -Doesn’t handle small letters well -Fairly accurate single character recognition (93.7%)