760 likes | 783 Views
Explore the application of a Bayesian framework to deformable pattern classification, with a focus on handwritten digit and word recognition. The thesis delves into modeling, matching, classification, and retrieval of non-rigid patterns, addressing critical factors for accuracy.
E N D
Bayesian Frameworks for Deformable Pattern Classification and Retrieval by Kwok-Wai Cheung January 1999
Model-Based Scene Analysis Knowledge Input Output An “H” model Integrated Segmentation and Recognition
Template Matching: Limitation Knowledge Reference Models Input Output Matching Score of “H” 10/12 Matching Score of “A” 10/12
Deformable Models • A deformable model is mostly referred to as an object shape abstraction and possesses shape varying capability for modeling non-rigid objects. A Deformable “6” Model
A Common Formulation Modeling Matching Classification Retrieval
Modeling • Model representation Hj • Model shape parameter vector w Hj(w3) Hj(w1) Hj(w2) w2 w3 w1 parameter space A Common Formulation
Matching • A search process (multi-criterion optimization) Model Deformation Criterion Data Mismatch Criterion Combined Criterion and Regularization w1 wf w0 Hj(wf) parameter space A Common Formulation
Classification A Common Formulation
Retrieval A Common Formulation
Thesis Overview Reasoning: Bayesian Framework Approach: Deformable Models Problem: Deformable Pattern Classification Application: Handwritten Digit Recognition Problem: Deformable Pattern Retrieval Application: Handwritten Word Retrieval
Presentation Outline • A Bayesian framework for deformable pattern classification (applied to handwritten character recognition) • Extensions of the framework • A competitive mixture of deformable models • Robust deformable matching • A Bayesian framework for deformable pattern detection (applied to handwritten word retrieval) • Conclusions and future works
A Bayesian Framework for Deformable Pattern Classification with Application to Isolated Handwritten Character Recognition
Posterior Distribution Prior Distribution w w Likelihood Function Data Distribution w D Bayesian Background
Bayesian Formulation Shape Parameter Distribution • Prior distribution (without data) • Likelihood function • Posterior distribution (with data)
MAP estimate Bayesian Inference: Matching • Matching by maximum a posteriori (MAP) estimation. parameter space
Bayesian Inference: Classification • Classification by computing the model evidence (Laplacian approximation).
Model Representation • Cubic B-splines for modeling handwritten character shape. • Shape parameter vector { w, A, T } • w = spline control points (local deformation) • {A,T} = affine transform parameter (global deformation) • Mixture of Gaussians for modeling black pixels.
Model Representation Spline curve 2 Control points with sequence number 3 1 7 Gaussian distributions modeling black pixels 8 6 Stroke width 4 5
Criterion Function Formulation • Model Deformation Criterion • Data Mismatch Criterion Mahalanobis distance Negative log of product of a mixture of Gaussians
Matching • MAP estimation for {w, A, T, a, b} using the expectation-maximization (EM) algorithm [Dempster et al. 1977]. • No closed form solutions and iterations between the estimation of {w, A, T} (linear) and that of {a, b} are required.
Matching Results Simple Initialization Affine Transform Initialization Final Match
Matching Results a* = 3.54 b* ~ 0.9 deformed less a* = 0.89 b* ~ 0.9 deformed more a* ~ 3.0 b* = 0.52 thicker stroke a* ~ 3.0 b* = 0.9 thinner stroke
Classification Best Match with highest P(D|H6). The output class is “Six”.
Critical Factors for Higher Accuracy • Size of the Model Set • how many models for each class? • Model Flexibility Constraints • Likelihood Inaccuracy • use prior only for the best few candidates. Unconstrained Constrained
Critical Factors for Higher Accuracy • Filtering Normalized “1” • Sub-part Detection For the NIST dataset we used, all the characters are normalized to 20x32. Some abnormal “1”s are observed. These are the unmatched portions for matching model “2” to data “0”.
Experiment • Training Set (NIST SD-1) • 11,660 digits (32x32 by 100 writers) • Test Set (NIST SD-1) • 11,791 digits (32x32 by 100 writers) • Size of Model Set = 23 (manually created)
Accuracy and Size of Model Set Accuracy 99.25% [Jain et al.1997] Optimal accuracy curve 94.7% [Our system] Manual Nearest Neighbor 2000 No. of models 23
Summary • A unified framework based on Bayesian inference is proposed for modeling, matching and classifying non-rigid patterns with promising results for handwritten character recognition. • Several critical factors related with the recognition accuracy are carefully studied.
Major Limitations of the Framework • The Scale-up Problem • The classification time increases linearly with the size of the model set. • The Outlier Problem • The framework is very sensitive to the presence of outlier data (e.g., strokes due to the adjacent characters)
The Scale-up Problem Solns. • Hardware solution • Independent Matching Process -> Highly Parallel Computing Architecture • Software solution • Cutting down the unnecessary computation by carefully designing the data structure and the implementation of the algorithm.
A Competitive Mixture of Deformable Models • LetH = {H1, H2, … , HM, p1, p2, … , pM} denote a mixture of M models. Input data D p1 p2 pM H1 H2 HM
A Competitive Mixture of Deformable Models • The Bayesian framework is extended and {pi} can then be estimated using the EM algorithm. • By maximizingp(D|H) and assuming the data D comes fromHi, the ideal outcome of {pi} = [0 0 .. 0 1 0 .. 0] pi
Speed up: Elimination Process Input data D p1 p2 pM H1 H2 HM
Experiment • Training Set (NIST SD-1) • 2,044 digits (32x32 by 30 writers) • Test Set (NIST SD-1) • 1,427 digits (32x32 by 19 writers) • Size of Model Set = 10 (manually created) • Elimination Rule • After the first iteration, only best R models are retained.
Experimental Results: Accuracy 95.1% 94.2% 92.7%
Experimental Results: Speedup 2.1 1.9 1.4
The Outlier Problem • The mixture of Gaussians noise model fails when some gross errors (outliers) are present. Well Segmented Input Badly Segmented Input
The Outlier Problem • There is a necessity to distinguish between the true data and the outliers. • Utilize true data and suppress outliers. Outliers True data
Use of Robust Statistics • Robust statistics takes into account the outliers by either: 1) Modeling them explicitly using probability distributions, e.g. uniform distribution 2) Discounting their effect (M-estimation), e.g. defining the data mismatch measure (which is normally quadratic) such that
Use of Robust Statistics • Suppressing the outliers’ contribution
Robust Linear Regression Without Robust Statistics With Robust Statistics
Robust Deformable Matching • An M-estimator is proposed such that Data Mismatch Criterion with Robust Statistics Original Data Mismatch Criterion
Experiment • Goal: To extract the leftmost characters from handwritten words. • Test Set - CEDAR database • Model Set - manually created • Model Initialization • Chamfer matching based on a distance transform.
Experimental Results Initialization Fixed Window Width 1 Fixed Window Width 2 Fixed Window Width 3 Robust Window
Summary • The basic framework can be extended to a competitive mixture of deformable models where significant speedup can be achieved. • The robust statistical approach is found to be an effective solution for robust deformable matching in the presence of outliers.