1 / 24

End-to-End Text Recognition with Convolutional Neural Networks

End-to-End Text Recognition with Convolutional Neural Networks. Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng. Computer Science Department Stanford University. * Denotes equal contribution. Scene Text Recognition Overview. Text “in the wild” are hard to recognize

uta
Download Presentation

End-to-End Text Recognition with Convolutional Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. End-to-End Text Recognition with Convolutional Neural Networks Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contribution

  2. Scene Text Recognition Overview • Text “in the wild” are hard to recognize • Wide range of variations in backgrounds, textures, fonts, and lighting conditions ICDAR 2003 Dataset S. Lucas et al., 2003 Street View Text Dataset K.Wang et al., 2011 Tao Wang

  3. Two-Stage Framework Detection/Classification High-level Inference “HOTEL” Tao Wang

  4. Works Classification and detection High-level inference Weinman et al., 2008 Appearance + Geometry Semi-Markov CRF K. Wang et al., 2011 HOG + Random Ferns Pictorial Structure Mishra et al., 2012 HOG + SVM with RBF Kernel CRF + N-gram model Neumann and Matas, 2012 MSER + SVM with RBF Kernel Exhaustive Graph Search Tao Wang

  5. Classification and detection High-level inference Most other approaches Hand-designed features + off-the-shelf classifier Graph based inference models Our approach Learnt features + 2-layer CNN Simple off-the-shelf heuristics Tao Wang

  6. ICDAR 62-way cropped character classification Lexicon ICDAR and SVT Cropped word recognition Various Benchmarks Detection/Classification End-to-end system after high-level inference ICDAR and SVT end-to-end text recognition SOTA SOTA SOTA on ICDAR Tao Wang

  7. Unsupervised Feature Learning Contrast Normalization + ZCA whitening K-Means Coates et al., 2011 Tao Wang

  8. √ Text × Non-Text 96 L2-SVM Classifier 256 Spatial Pooling Spatial Pooling 1st layer 2nd layer ~10K parameters for detection ~50K parameters for classification Large representation but not enough data. Overfitting? Convolution Convolution Backpropagation Tao Wang

  9. Synthetic Data Real Real Data Unrealistic Synthetic Data Synthetic Java.Font + Natural backgrounds Color Statistics Synthetic “hard negatives” Tao Wang

  10. Detector Performance Tao Wang

  11. Text Line Bounding boxes Candidate spaces Tao Wang

  12. Classifier Performance 62-way classification accuracy on ICDAR cropped characters Higher is better 83.9 Accuracy(%) (on ICDAR-Sample characters) Tao Wang

  13. Tao Wang

  14. Sliding window position Char Class Tao Wang

  15. Word Recognition Lexicon: … MAKE SERIES ESTATE POKER … S E R I E S -5.45 7.82 -1.74 -9.02 max ∑ Tao Wang

  16. Cropped Word Recognition Accuracy Accuracy(%) Higher is better Cropped Words Benchmarks Tao Wang

  17. Candidate spaces generated by detector … … Tao Wang

  18. Tao Wang

  19. End-to-end text recognition results F-Score Higher is better End-to-end Benchmarks Tao Wang

  20. Sample Output Images from SVT Tao Wang

  21. Sample Output Images from ICDAR-FULL Tao Wang

  22. c LEXICON -- “confidence margin” POSE POST PEOPLE PISTOL … Suggested Words POS POST Our F-score: 0.38 Neumann and Matas, 2010: 0.40 Hunspell PEOST PEOSTEL Tao Wang

  23. Conclusion • Learnt features + 2-layer CNN for+ character detection and classification • Simple heuristics to build end-to-end scene text recognition system • State-of-the-art performances on - ICDAR cropped character classification - ICDAR cropped word recognition - Lexicon based end-to-end recognition on ICDAR and SVT • Extensible to more general lexicon with off-the-shelf spelling checker Tao Wang

  24. Questions? Tao Wang

More Related