Yongqiang Wang 1,2 , Qiang Huo 1 1 Microsoft Research Asia, Beijing, China 2 The University of Hong Kong, Hong Kong, Ch

Design Compact Recognizers of Handwritten Chinese Characters Using Precision Constrained Gaussian Models, Minimum Classification Error Training and Parameter Compression Yongqiang Wang1,2 , Qiang Huo1 1Microsoft Research Asia, Beijing, China 2The University of Hong Kong, Hong Kong, China (qianghuo@microsoft.com) ICDAR-2009, Barcelona, Spain, July 26-29, 2009

Outline • Prior art • Our approach • Experiments and results • Conclusions

Prior art to build a compact online CJK handwritten character recognizer • Feature vector: • 8-directional features (Bai & Huo, 2005) • Z.-L. Bai and Q. Huo, “A study on the use of 8-directional features for online handwritten Chinese character recognition,” Proc. ICDAR-2005, pp.262-266. • LDA for dimension reduction • Classifier: • Modified Quadratic Discriminant Function (MQDF, Kimura et al., 1987) • F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, “Modified quadratic discriminant functions and the application to Chinese character recognition,” IEEE Trans. on PAMI, vol. 9, pp.149-153, 1987. • Minimum Classification Error (MCE) training for MQDF-based classifier (Liu et al., 2004) • C.-L. Liu, H. Sako, and H. Fujisawa, “Discriminative learning quadratic discriminant function for handwriting recognition,” IEEE Trans. on Neural Networks, vol. 15, no. 2, pp.430-444, 2004. • Parameter compression using Split-VQ (Long & Jin, 2008) • T. Long and L.-W. Jin, “Building compact MQDF classifier for large character set recognition by subspace distribution sharing,” Pattern Recognition, vol. 41, no. 9, pp.2916-2925, 2008.

What’s wrong: bad accuracy-footprint tradeoff

Our New Approach • Precision Constrained Gaussian Model (PCGM) for handwritten CJK character recognition: • Borrowed from ASR area • Demonstrated previously that ML-trained PCGM can achieve much better accuracy-footprint tradeoff than the conventional ML-trained MQDF (Wang & Huo, 2009) • Y.-Q. Wang and Q. Huo, “Modeling inverse covariance matrices by expansion of tied basis matrices for online handwritten Chinese character recognition, ” Pattern Recognition, 2009. • What’s new in this study? • MCE training of PCGM-based classifier using Quickprop • Compression of PCGM parameters using scalar quantization and split VQ • Embed parameter compression into MCE training • Comparative study with their MQDF counterparts

Precision Constrained Gaussian Model (PCGM) • For a pattern classification problem with M “character classes”: • Each pattern class Cjis modeled by a Gaussian PDF, where the inverse covariance matrix (a.k.a. precision matrix) is constrained to lie in a subspace: • PCGM-based discriminant function: • Classifier parameters to be stored: • Character-dependent (transformed) mean vectors • Character-independent basis matrices • Character-dependent basis coefficients

MCE Training of PCGM-based Classifier • Discriminant function of PCGM-based classifier • For the i-th training feature vector, xji, belonging to j-th class, a misclassification measure is defined as • Per-sample loss function: • Batch-modeQuickprop algorithm is used to minimize the empirical loss functiondefined on the training data set: • Take advantage of “cluster computing” • In practice, only mean vectors are updated

Parameter Compression (1) • Compress mean vectors using split VQ • Split D-dimensional (transformed) mean vectors into Qstreams • Sub-vectors in different streams are quantized by VQ with different codebooks • Basis coefficients • Compressed using split VQ as well

Parameter Compression (2) • Compress basis matrices • Basis matrices are symmetric, therefore only need to store the diagonal and upper-diagonal elements • Diagonal elements are not compressed • Upper-diagonal elements are compressed using 8-bit scalar quantization

Combining MCE Training and Parameter Compression • Start from ML-trained PCGM models • Compress basis coefficients • Compress basis matrices • Do MCE training to fine-tune the mean vectors using the above compressed parameters • Compress the (transformed) mean vectors A similar strategy is also applied to MCE training and model compression for MQDF-based classifier!

Experimental Setup (1) • Task: • Recognition of 2965 “JIS level-1” handwritten Kanji characters • Data sets: • Training set: 704,650 samples from Nakayosidatabase • Development set: 229,398 samples from Nakayosi/Kuchibue databases • Testing set: 506,848 samples from Kuchibue database • Feature vector: • 512 8-directional raw features extracted from each handwriting sample • Use LDA to reduce dimension to 128

Experimental Setup (2) • ML training • MQDF • 20 eigenvectors are retained for each character class, i.e. MQDF(20) • PCGM • # of basis matrices : 32, 64, 128, i.e., PCGM(32/64/128) • MCE training • Run 20 epochs in Quickprop algorithm to update mean vectors • Both MQDF and PCGM • Parameter compression • Compress MQDF/PCGM-based recognizers to less than 2MB footprint

Experimental Results • Given the same footprint, PCGM achieves higher recognition accuracy than MQDF; • PCGM-based recognizer can be made very compact, yet achieves high recognition accuracy; • PCGM-based recognizers are slower than MQDF-based recognizers: • Evaluated on a PC with 3 GHz Pentium-4 CPU • Re-scoring on a short-list of 50 candidates

Conclusion • PCGM-based approach provides a good solution to designing a compact CJK handwritten character recognizer • Ongoing and future works: • Better discriminative training for both MQDF/PCGM to boost the recognition accuracy further • Try it out for Japanese and Korean

Backup slides

Experimental Results • Comparison of ML-trained PCGM and MQDF based classifiers without parameter compression

Experimental Results Before Precision Matrices Compression After Precision Matrices Compression

Experimental Results Before MCE training After MCE training

Experimental Results Before Mean Compression After Mean Compression

Yongqiang Wang 1,2 , Qiang Huo 1 1 Microsoft Research Asia, Beijing, China 2 The University of Hong Kong, Hong Kong, Ch

Yongqiang Wang 1,2 , Qiang Huo 1 1 Microsoft Research Asia, Beijing, China 2 The University of Hong Kong, Hong Kong, Ch

Presentation Transcript

(Hong Kong)

Hong Kong Adventure 1

Hong Kong

Hong Kong

Hong Kong

Hong Kong

Hong Kong

Hong Kong

Hong Kong China Business

Hong Kong

(1) ISEIS, Chinese University of Hong Kong, NT, Shatin, Hong Kong

Hong Kong China Business

Professor George KK Lau The University of Hong Kong Hong Kong SAR, China

1 Hong Kong

Hong Kong

Hong Kong

Hong Kong

Hong Kong, China

Hong Kong lawmakers defy Beijing

Hong Kong Adventure 1

Professor George KK Lau The University of Hong Kong Hong Kong SAR, China