1 / 12

Character Recognition Using Machine Learning Techniques

Character Recognition Using Machine Learning Techniques. Matthew Peterson Muktesh Khole Aseem Gogte Jeremy Kindseth. Problem Statement and Assumptions. “ Box” letters for analysis Identify alphanumeric symbols in documents or images (A-Z, a-z, 0-9). Image Character. A B C.

enoch
Download Presentation

Character Recognition Using Machine Learning Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Character Recognition Using Machine Learning Techniques Matthew Peterson MukteshKhole AseemGogte Jeremy Kindseth

  2. Problem Statement and Assumptions “Box” letters for analysis Identify alphanumeric symbols in documents or images (A-Z, a-z, 0-9) ImageCharacter A B C

  3. Problem Difficulty Harder Easier r s M k H a u e d a t e H Not easily separable; surrounding artifacts; no consistent font D R O F S N A R T M E Skewed but separable Easy to box Simple to Decompose Complex Fonts, boxed Simple Fonts, boxed

  4. Hypotheses • Accuracy: K-Means Clustering + EM Algorithm can be used for accurate classification of alphanumeric symbols • K-Means Clustering is as accurate as pixel-by-pixel detection using MSE • Efficiency: K-Means Clustering is faster than pixel-by-pixel detection using MSE

  5. Features • Analysis of geometric locations of centroids X axis Y axis

  6. Training and Testing Criteria

  7. Accuracy Results Training: OCRA – Testing: Arial Training: OCRA – Testing: Handwritten Training: Frankenset– Testing: Handwritten Training: SuperSet– Testing: Handwritten Training: Frankenset– Testing: OCRA Training: SuperSet– Testing: OCRA Training: SuperSet– Testing: Arial Training: OCRA – Testing: OCRA Training: Frankenset– Testing: Arial

  8. Time Performance • SuperSet Training vs. Pix-by-Pix • Testing against Fonts/Handwritten • K = 11 used

  9. Hard Problems • Trouble pairs • 2;7 • 3;6;8;9 • O;Q;0 • V;U • B;D;R • I;l (uppercase “i”, lowercase “L”) • Scaling problems (X vs. x; C vs. c)

  10. Conclusions Method • Random starting locations did not work • Scaling is important, especially for upper/lowercase differentiation (scaled to 128 x 128) • “Adaptable K” K-Means Clustering may be interesting • SuperSet method could be used to handle transformed data Results • Pixel-by-Pixel is more accurate when SuperSetused for training • For K-Means Clustering, SuperSetworked better for training • SuperSet increases memory usage and processor time – O(n!) • Frankenset is not a training model

  11. Conclusions Regarding Hypotheses • Accuracy: K-Means Clustering + EM Algorithm can be used for accurate classification of alphanumeric symbols • K-Means Clustering is as accurate as pixel-by-pixel detection using MSE • Efficiency: K-Means Clustering is faster than pixel-by-pixel detection using MSE All Hypotheses Affirmed K-Means Clustering Effectively Serves as Method for Compressing Alphanumeric Data

  12. References Sheshadri, Karthik, PavanAmbekar, Deeksha Prasad, Ramakanth Kumar. “An OCR System for Printed Kannada Using K-means Clustering.” Industrial Technology (ICIT) 2010 IEEE International Conference on (March 2010): 14-17.

More Related