1 / 12

Development of an OCR System

Development of an OCR System. Nathan Harmata TJHSST Computer Systems Lab 2007-2008. What is OCR?. Optical Character Recognition. Font and handwriting based. Goals of My Project. Generic recognition for Latin-based fonts. System built from scratch. Proper handling of most formatting.

duscha
Download Presentation

Development of an OCR System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Development of an OCR System Nathan Harmata TJHSST Computer Systems Lab 2007-2008

  2. What is OCR? Optical Character Recognition Font and handwriting based

  3. Goals of My Project Generic recognition for Latin-based fonts System built from scratch Proper handling of most formatting

  4. Overview of Idocrase System

  5. Image Processing

  6. Transformations Attribute Character Model

  7. Transformations Sector Vector - image is parsed into parts that pass the vertical line test - then each part is transformed into a collection of line segments Gap Vector - gaps, if any, are found on the four sides of the image

  8. Transformations Pixel Concentration Vector – which sides, if any, have a higher concentration of pixels

  9. Character Recognition GCDD – Generic Character Definition Database Averages of Character Models for every character from many different fonts 0 PixelConcentrationVector balanced balanced SectorVector 4 3 GapVector

  10. Character Recognition For a single character: For words, dictionary and grammar references are used.

  11. Idocrase Application

  12. Results -Mediocre word recognition -Doesn’t handle formatting well -Doesn’t handle small letters well -Fairly accurate single character recognition (93.7%)‏

More Related