1 / 18

On-Line Handwriting Recognition

On-Line Handwriting Recognition. Transducer device (digitizer) Input: sequence of point coordinates with pen-down/up signals from the digitizer Stroke: sequence of points from pen-down to pen-up signals Word: sequence of one or more strokes. System Overview. Pre-processing

omer
Download Presentation

On-Line Handwriting Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-Line Handwriting Recognition • Transducer device (digitizer) • Input: sequence of point coordinates with pen-down/up signals from the digitizer • Stroke: sequence of points from pen-down to pen-up signals • Word: sequence of one or more strokes. Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  2. System Overview Pre-processing (high curvature points) Input Dictionary Segmentation Character Recognizer Recognition Engine Context Models Word Candidates Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  3. Segmentation Hypotheses • High-curvature points and segmentation points: Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  4. Character Recognition I • Fisher Discriminant Analysis (FDA): improves over PCA (Principal Component Analysis). p=WTx Linear projec- tion Original space Projection space • Training set: 1040 lowercase letters, Test set: 520 lowercase letters • Test results: 91.5% correct Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  5. Fisher Discriminant Analysis • Between-class scatter matrix • C: number of classes • Ni: number of data vectors in class i • i: mean vector of class i and: mean vector • Within-class scatter matrix • vji: j-th data vector of class i. Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  6. Given a projection matrix W (of size n by m) and its linear transformation , the between-class scatter in the projection space is Similarly Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  7. Fisher Discriminant Analysis (cont.) • Optimization formulation of the fisher projection solution: (YB, YW are scatter matrices in projection space) Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  8. FDA (continued) • Construction of the Fisher projection matrix: • Compute the n eigenvalues and eigenvectors of the generalized eigenvalue problem: • Retain the m eigenvectors having the largest eigenvalues. They form the columns of the target projection matrix. Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  9. Character Recognition Results • Training set: 1040 lowercase letters • Test set: 520 lowercase letters • Test results: Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  10. Challenge I • The problem of the previous approach is: non-characters are classified as characters. When applied to cursive words it creates several/too many non-sense word hypothesis by extracting characters where they don’t seem to exist. • More generally, one wants to be able to generate shapes and their deformations. Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  11. Challenge II • How to extract reliable local geometric features of images (corners, contour tangents, contour curvature, …) ? • How to group them ? • Large size data base to match one input, how to do it fast ? • Hierarchical clustering of the database, possibly over a tree structure or some general graph. How to do it ? Which criteria to cluster ? Which methods to use it ? Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  12. Recognition Engine • Integrates all available information, generates and grows the word-level hypotheses. • Most general form: graph and its search. • Hypothesis Propagation Network Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  13. H (t, m) Class m's legal predecessors List length T t Look-back window range 3 2 1 Time "a" "b” m "y" "z" Hypothesis Propagation Network Recognition of 85% on 100 words (not good) Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  14. Challenge III • How to search more efficiently in this network and more generally on Bayesian networks ? Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  15. “go” Relative height ratio and positioning “90” Character heights Visual Bigram Models (VBM) • Some characters can be very ambiguous when isolated: “9” and “g”; “e” and “l”; “o” and “0”; etc, but more obvious when put in a context. Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  16. VBM: Parameters • Height Diff. Ratio: • HDR = (h1- h2) / h • Top Diff. Ratio: • TDR = (top1- top2) / h • Bottom Diff. Ratio: • BDR = (bot1- bot2) / h top1 top2 h1 h h2 bot1 bot2 Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  17. VBM: Ascendancy Categories • Total 9 visual bigram categories (instead of 26x26=676). Work with student Jong Oh Davi Geiger, Courant Institute, NYU

  18. VBM: Test Results Work with student Jong Oh Davi Geiger, Courant Institute, NYU

More Related