490 likes | 663 Views
(Off-Line) Cursive Word Recognition. Tal Steinherz Tel-Aviv University. Cursive Word Recognition. Preprocessing. Segmentation. Feature Extraction . Recognition. Post Processing. Preprocessing. Skew correction Slant correction Smoothing Reference line finding. Segmentation Motivation.
E N D
(Off-Line) Cursive Word Recognition Tal Steinherz Tel-Aviv University
Cursive Word Recognition Preprocessing Segmentation Feature Extraction Recognition Post Processing
Preprocessing • Skew correction • Slant correction • Smoothing • Reference line finding
Segmentation Motivation • Given a 2-dimensional image and a model that expects a 1-dimensional input signal, one needs to derive an ordered list of features. • Fragmentation is another alternative where the resulting pieces have no literal meaning.
Segmentation Dilemma • To segment or not to segment ? That’s the question! • Sayre’s paradox: “To recognize a letter, one must know where it starts and where it ends, to isolate a letter, one must recognize it first”.
Recognition Model • What is the basic (atomic) model? • word (remains identical through training and recognition) • letter (concatenated on demand during recognition) • What are the training implications? • specific = total cover (several samples for each word) • dynamic = brick cover (samples of various words that include all possible characters=letters)
. . . . . . i th letter sub-model last letter sub-model 1st letter sub-model Basic Word Model
Segmentation-Free • In a segmentation-free approach recognition is based on measuring the distance between observation sequences
Segmentation-Free - continue • The most popular metric is Levenshtein’s Edit Distance, where a transformation between sequences is done by atomic operations: insertion, deletion and substitution - associated with different costs • Implementations: Dynamic programming, HMM
Segmentation-Free (demo) • Each column was translated into a feature vector. • Two types of features: • number of zero-crossing • gradient of the word’s curve
Normal Transition Null Transition Letter sub-HMM components
Normal Transition Null Transition Letter sub-HMM
Segmentation-Based • In a segmentation-based approach recognition is based on complete bipartite match-making between blocks of primitive segments and letters of a word
Segmentation-Based - continue • The best match is found by the dynamic programming Viterbi algorithm • An implementation by an HMM is very popular and enhances the model capabilities
Segmentation-Based (demo) • First the word is heuristically segmented. • It is preferable to over segment a character. Nevertheless a character must not span more than a predefined number of segments. • Each segment is translated into a feature vector.
Features in Segments (demo) • Global features: • ascenders, descenders, loops, i dots, t strokes • Local features: • X crossings, T crossings, end points, sharp curvatures, parametric strokes • Non-symbolic features: • pixel moments, pixel distributions, contour condings
1 2 3 4 1 Letter sub-HMM (maximum 4 segments per character)
L M R L M R L Two-Letter joined sub-HMM (0.5-3 segments per character)
Pattern Recognition Issues • Lexicon size: • small (up to 100 words) • limited (between 100 to 1000 words) • infinite (more than 1000 words)
. . . . . . ‘m’ sub-HMM ‘z’ sub-HMM ‘a’ sub-HMM Word Model Extension • A new approach to practice recognition? • path discriminant (a single general word model, a path=hypothesis per word)
Online vs. Off-Line • Online – captured by pen-like devices.the input format is a two-dimensional signal of pixel locations as a function of time (x(t),y(t)). • Off-line – captured by scanning devices.the input format is a two-dimensional image of gray-scale colors as a function of location I(m*n).strokes have significant width.
Online vs. Off-Line (cont.) • In general online classifiers are superior to off-line classifiers because some valuable strokes are blurred in the static image.Sometimes temporal information (stroke order) is also a must in order to distinguish between similar objects.
Online Weaknesses Sensitivity to stroke order, stroke number and stroke characteristics variations: • Similar shapes that resemble in the image domain might be produced by different sets of strokes. • Many redundant strokes (consecutive superfluous pixels) that are byproducts of the continuous nature of cursive handwriting. • Incomplete (open) loops are more frequent.
Off-Line can improve Online • Sometimes the off-line representation enables one to recognize words that are not recognized given the online signal. • An optimal system would combine online and off-line based classifiers.
The desired integration between online and off-line classifiers • Having a single word recognition engine to practice both the online and off-line data. • It requires an off-line to online transformation to extract an alternative list of strokes that preserves off-line like features while being consistent in order.
Online signal Projection to image Domain Bitmap image Stroke width=1 Online signal “Painting” (thickening the strokes) Real static image Stroke width>1 The “pseudo-online” transformation Pseudo-online representation Online recognition engine C l a s s i f i c a t i o n Online classifiers Pseudo-online classifiers Online classification outputs Pseudo-online classification outputs Integration by some combination scheme Recognition results
Cursive Handwriting Terms • Axis - The main subset of strokes that assemble the backbone, which is the shortest path from left to right including loops on several occasions. • Tarsi - The other subsets of connected strokes that produce branches, which are hang above (in case of ascenders) or below (in case of descenders) the axis .
The Pseudo-Online Transformation • Follow the skeleton of the axis from the left most pixel until reaching the first intersection with a tarsus. • Surround the tarsus by tracking its contour until returning back to the intersection point we started from. • Continue along the axis to the next intersection with a tarsus, and so on until the right most pixel is reached. • Loops that are encountered along the axis are also surrounded completely.
Experimental Setup • The online word recognition engine of Neskovic et al. – satisfies Trainability and Versatility. • A combination of 6/12 online and pseudo-online classifiers. • Several combination schemes – majority vote, max rule, sum rule. • An extension of the HP’s dataset that can be found in the UNIPEN collection.
Experimental Setup (cont.) • Different Training sets of 46 writers. • Disjoint validation sets of 9 writers. • Disjoint test set of 11 writers. • The lexicon contains 862 words.
Result Analysis • Word level - in 110 word classes (12.8%) at least 7 word samples (10.6%) were correctly recognized only by the combination with the pseudo-online classifiers. • Writer level – for 12 writers (18.2%) at least 65 of the words they produced (7.5%) were correctly recognized only by the combination with the pseudo-online classifiers.
Result Analysis (cont.) • 909 of the input words (5.9%) were correctly recognized by at least one pseudo-online classifier and neither one of the 12 online classifiers. • 357 of the input words (2.3%) were correctly recognized by at least 4 of the 12 pseudo-online classifiers and neither one of the 12 online classifiers. • For 828 of the input words (5.3%) the difference between the number of pseudo-online and online classifiers that correctly recognized them was 6 or more.
Conclusions • The pseudo-online representation does add information that cannot be obtained by optimizing \ extending a combination of online classifiers only.