180 likes | 265 Views
On-Line Handwriting Recognition. Transducer device (digitizer) Input: sequence of point coordinates with pen-down/up signals from the digitizer Stroke: sequence of points from pen-down to pen-up signals Word: sequence of one or more strokes. System Overview. Pre-processing
E N D
On-Line Handwriting Recognition • Transducer device (digitizer) • Input: sequence of point coordinates with pen-down/up signals from the digitizer • Stroke: sequence of points from pen-down to pen-up signals • Word: sequence of one or more strokes. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
System Overview Pre-processing (high curvature points) Input Dictionary Segmentation Character Recognizer Recognition Engine Context Models Word Candidates Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Segmentation Hypotheses • High-curvature points and segmentation points: Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Character Recognition I • Fisher Discriminant Analysis (FDA): improves over PCA (Principal Component Analysis). p=WTx Linear projec- tion Original space Projection space • Training set: 1040 lowercase letters, Test set: 520 lowercase letters • Test results: 91.5% correct Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Fisher Discriminant Analysis • Between-class scatter matrix • C: number of classes • Ni: number of data vectors in class i • i: mean vector of class i and: mean vector • Within-class scatter matrix • vji: j-th data vector of class i. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Given a projection matrix W (of size n by m) and its linear transformation , the between-class scatter in the projection space is Similarly Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Fisher Discriminant Analysis (cont.) • Optimization formulation of the fisher projection solution: (YB, YW are scatter matrices in projection space) Work with student Jong Oh Davi Geiger, Courant Institute, NYU
FDA (continued) • Construction of the Fisher projection matrix: • Compute the n eigenvalues and eigenvectors of the generalized eigenvalue problem: • Retain the m eigenvectors having the largest eigenvalues. They form the columns of the target projection matrix. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Character Recognition Results • Training set: 1040 lowercase letters • Test set: 520 lowercase letters • Test results: Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Challenge I • The problem of the previous approach is: non-characters are classified as characters. When applied to cursive words it creates several/too many non-sense word hypothesis by extracting characters where they don’t seem to exist. • More generally, one wants to be able to generate shapes and their deformations. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Challenge II • How to extract reliable local geometric features of images (corners, contour tangents, contour curvature, …) ? • How to group them ? • Large size data base to match one input, how to do it fast ? • Hierarchical clustering of the database, possibly over a tree structure or some general graph. How to do it ? Which criteria to cluster ? Which methods to use it ? Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Recognition Engine • Integrates all available information, generates and grows the word-level hypotheses. • Most general form: graph and its search. • Hypothesis Propagation Network Work with student Jong Oh Davi Geiger, Courant Institute, NYU
H (t, m) Class m's legal predecessors List length T t Look-back window range 3 2 1 Time "a" "b” m "y" "z" Hypothesis Propagation Network Recognition of 85% on 100 words (not good) Work with student Jong Oh Davi Geiger, Courant Institute, NYU
Challenge III • How to search more efficiently in this network and more generally on Bayesian networks ? Work with student Jong Oh Davi Geiger, Courant Institute, NYU
“go” Relative height ratio and positioning “90” Character heights Visual Bigram Models (VBM) • Some characters can be very ambiguous when isolated: “9” and “g”; “e” and “l”; “o” and “0”; etc, but more obvious when put in a context. Work with student Jong Oh Davi Geiger, Courant Institute, NYU
VBM: Parameters • Height Diff. Ratio: • HDR = (h1- h2) / h • Top Diff. Ratio: • TDR = (top1- top2) / h • Bottom Diff. Ratio: • BDR = (bot1- bot2) / h top1 top2 h1 h h2 bot1 bot2 Work with student Jong Oh Davi Geiger, Courant Institute, NYU
VBM: Ascendancy Categories • Total 9 visual bigram categories (instead of 26x26=676). Work with student Jong Oh Davi Geiger, Courant Institute, NYU
VBM: Test Results Work with student Jong Oh Davi Geiger, Courant Institute, NYU