240 likes | 255 Views
End-to-End Text Recognition with Convolutional Neural Networks. Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng. Computer Science Department Stanford University. * Denotes equal contribution. Scene Text Recognition Overview. Text “in the wild” are hard to recognize
E N D
End-to-End Text Recognition with Convolutional Neural Networks Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contribution
Scene Text Recognition Overview • Text “in the wild” are hard to recognize • Wide range of variations in backgrounds, textures, fonts, and lighting conditions ICDAR 2003 Dataset S. Lucas et al., 2003 Street View Text Dataset K.Wang et al., 2011 Tao Wang
Two-Stage Framework Detection/Classification High-level Inference “HOTEL” Tao Wang
Works Classification and detection High-level inference Weinman et al., 2008 Appearance + Geometry Semi-Markov CRF K. Wang et al., 2011 HOG + Random Ferns Pictorial Structure Mishra et al., 2012 HOG + SVM with RBF Kernel CRF + N-gram model Neumann and Matas, 2012 MSER + SVM with RBF Kernel Exhaustive Graph Search Tao Wang
Classification and detection High-level inference Most other approaches Hand-designed features + off-the-shelf classifier Graph based inference models Our approach Learnt features + 2-layer CNN Simple off-the-shelf heuristics Tao Wang
ICDAR 62-way cropped character classification Lexicon ICDAR and SVT Cropped word recognition Various Benchmarks Detection/Classification End-to-end system after high-level inference ICDAR and SVT end-to-end text recognition SOTA SOTA SOTA on ICDAR Tao Wang
Unsupervised Feature Learning Contrast Normalization + ZCA whitening K-Means Coates et al., 2011 Tao Wang
√ Text × Non-Text 96 L2-SVM Classifier 256 Spatial Pooling Spatial Pooling 1st layer 2nd layer ~10K parameters for detection ~50K parameters for classification Large representation but not enough data. Overfitting? Convolution Convolution Backpropagation Tao Wang
Synthetic Data Real Real Data Unrealistic Synthetic Data Synthetic Java.Font + Natural backgrounds Color Statistics Synthetic “hard negatives” Tao Wang
Detector Performance Tao Wang
Text Line Bounding boxes Candidate spaces Tao Wang
Classifier Performance 62-way classification accuracy on ICDAR cropped characters Higher is better 83.9 Accuracy(%) (on ICDAR-Sample characters) Tao Wang
Sliding window position Char Class Tao Wang
Word Recognition Lexicon: … MAKE SERIES ESTATE POKER … S E R I E S -5.45 7.82 -1.74 -9.02 max ∑ Tao Wang
Cropped Word Recognition Accuracy Accuracy(%) Higher is better Cropped Words Benchmarks Tao Wang
Candidate spaces generated by detector … … Tao Wang
End-to-end text recognition results F-Score Higher is better End-to-end Benchmarks Tao Wang
Sample Output Images from SVT Tao Wang
Sample Output Images from ICDAR-FULL Tao Wang
c LEXICON -- “confidence margin” POSE POST PEOPLE PISTOL … Suggested Words POS POST Our F-score: 0.38 Neumann and Matas, 2010: 0.40 Hunspell PEOST PEOSTEL Tao Wang
Conclusion • Learnt features + 2-layer CNN for+ character detection and classification • Simple heuristics to build end-to-end scene text recognition system • State-of-the-art performances on - ICDAR cropped character classification - ICDAR cropped word recognition - Lexicon based end-to-end recognition on ICDAR and SVT • Extensible to more general lexicon with off-the-shelf spelling checker Tao Wang
Questions? Tao Wang