End-to-End Text Recognition with Convolutional Neural Networks

End-to-End Text Recognition with Convolutional Neural Networks Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contribution

Scene Text Recognition Overview • Text “in the wild” are hard to recognize • Wide range of variations in backgrounds, textures, fonts, and lighting conditions ICDAR 2003 Dataset S. Lucas et al., 2003 Street View Text Dataset K.Wang et al., 2011 Tao Wang

Two-Stage Framework Detection/Classification High-level Inference “HOTEL” Tao Wang

Works Classification and detection High-level inference Weinman et al., 2008 Appearance + Geometry Semi-Markov CRF K. Wang et al., 2011 HOG + Random Ferns Pictorial Structure Mishra et al., 2012 HOG + SVM with RBF Kernel CRF + N-gram model Neumann and Matas, 2012 MSER + SVM with RBF Kernel Exhaustive Graph Search Tao Wang

Classification and detection High-level inference Most other approaches Hand-designed features + off-the-shelf classifier Graph based inference models Our approach Learnt features + 2-layer CNN Simple off-the-shelf heuristics Tao Wang

ICDAR 62-way cropped character classification Lexicon ICDAR and SVT Cropped word recognition Various Benchmarks Detection/Classification End-to-end system after high-level inference ICDAR and SVT end-to-end text recognition SOTA SOTA SOTA on ICDAR Tao Wang

Unsupervised Feature Learning Contrast Normalization + ZCA whitening K-Means Coates et al., 2011 Tao Wang

√ Text × Non-Text 96 L2-SVM Classifier 256 Spatial Pooling Spatial Pooling 1st layer 2nd layer ~10K parameters for detection ~50K parameters for classification Large representation but not enough data. Overfitting? Convolution Convolution Backpropagation Tao Wang

Synthetic Data Real Real Data Unrealistic Synthetic Data Synthetic Java.Font + Natural backgrounds Color Statistics Synthetic “hard negatives” Tao Wang

Detector Performance Tao Wang

Text Line Bounding boxes Candidate spaces Tao Wang

Classifier Performance 62-way classification accuracy on ICDAR cropped characters Higher is better 83.9 Accuracy(%) (on ICDAR-Sample characters) Tao Wang

Tao Wang

Sliding window position Char Class Tao Wang

Word Recognition Lexicon: … MAKE SERIES ESTATE POKER … S E R I E S -5.45 7.82 -1.74 -9.02 max ∑ Tao Wang

Cropped Word Recognition Accuracy Accuracy(%) Higher is better Cropped Words Benchmarks Tao Wang

Candidate spaces generated by detector … … Tao Wang

Tao Wang

End-to-end text recognition results F-Score Higher is better End-to-end Benchmarks Tao Wang

Sample Output Images from SVT Tao Wang

Sample Output Images from ICDAR-FULL Tao Wang

c LEXICON -- “confidence margin” POSE POST PEOPLE PISTOL … Suggested Words POS POST Our F-score: 0.38 Neumann and Matas, 2010: 0.40 Hunspell PEOST PEOSTEL Tao Wang

Conclusion • Learnt features + 2-layer CNN for+ character detection and classification • Simple heuristics to build end-to-end scene text recognition system • State-of-the-art performances on - ICDAR cropped character classification - ICDAR cropped word recognition - Lexicon based end-to-end recognition on ICDAR and SVT • Extensible to more general lexicon with off-the-shelf spelling checker Tao Wang

Questions? Tao Wang

End-to-End Text Recognition with Convolutional Neural Networks

End-to-End Text Recognition with Convolutional Neural Networks

Presentation Transcript

Tiled Convolutional Neural Networks

Fingerprint Recognition – Neural Networks

Computer Networks Chapter 5: End-to-End Protocols

Face Recognition: A Convolutional Neural Network Approach

Convolutional Networks

Generating Text with Recurrent Neural Networks

Introduction: Convolutional Neural Networks for Visual Recognition

Disconnected, Non end-to-end networks

End to End Bill Reconciliation with

End-to-End Text Recognition with Convolutional Neural Networks

End-to-end Performance over Research Networks

Achieving End-to-End Fairness in Wireless Networks

End-to-end slicing in all-optical networks

Heterogeneous convolutional neural networks for visual recognition

Disconnected, Non end-to-end networks

End-to-end Performance over Research Networks

End To End Solutions with Everwood