150 likes | 465 Views
Con-Text: Text Detection Using Background Connectivity for Fine-Grained Object Classification. Sezer Karaoglu, Jan van Gemert, Theo Gevers. Can we achieve a better object recognition with the help of scene-text ?. Goal.
E N D
Con-Text: Text Detection Using Background Connectivity for Fine-Grained Object Classification Sezer Karaoglu, Jan van Gemert, Theo Gevers
Can we achieve a better object recognition with the help of scene-text?
Goal • Exploit hidden details by text in the scene to improve visual classification of very similar instances. DJ SUBS Breakfast Starbucks Coffee StarbucksCoffee SKY SKY SKY CAR CAR Application : Linking images from Google street view to textual business inforation as e.g. the Yellow pages, Geo-referencing, Information retrieval
Challenges of Text Detection in Natural Scene Images • Lighting • Surface Reflections • Unknown background • Non-Planar objects • Unknown Text Font • Unknown Text Size • Blur
Literature Review Text Detection • Texture Based: Wang et al. “End-to-End Scene Text Recognition” ICCV ‘11 • Computational Complexity • Dataset specific • Do not rely on heuristic rules • Region Based: Epshtein et al. “Detecting Text in Natural Scenes with Stroke Width Transform ” CVPR ‘10 • Hard to define connectivity • Segmentation helps to improve ocr performance
Motivation to remove background for Text Detection • To reduce majority of image regions for further processes. • To reduce false positives caused by text like image regions (fences, bricks, windows, and vegetation). • To reduce dependency on text style.
Proposed Text Detection Method Text detection by BG substraction Automatic BG seed selection BG reconstruction
Background Seed Selection • Color, contrast and objectness responses are used as feature. • Random Forest classifier with 100 trees based on out-of-bag error are used to create forest. • Each tree is constructed with three random features. • The splitting of the nodes is made based on GINI criterion. Original Image Color Boosting Contrast Objectness
ConditionalDilationfor BG connectivity where B is the structring element (3 by-3 square), M is the binary image where bg seeds are ones and X is the gray level input image until repeat
Text Recognition Experiments • ICDAR’03 Dataset with 251 test images, 5370 characters, 1106 words.
ICDAR 2003 DatasetChar. Recognition Results The proposed system removes 87% of the non-text regions where on average 91% of the test set contains non-text regions. It retains approximately %98 of text regions.
ImageNet Dataset Bakery Country House Discount House Funeral Pizzeria Steak • ImageNet building and place of business dataset ( 24255 images 28 classes, largest dataset ever used for scene tekst recognition) • The images do not necessarily contain scene text. • Visual features : 4000 visual words,standard gray SIFT only. • Text features: Bag-of-bigrams , ocr resultsobtained for each image in the dataset. • 3 repeats, to compute standard deviations in Avg. Precision. • Histogram Intersection Kernel in libsvm. • Text only, Visual only and Fused results are compared.
Fine-GrainedBuildingClassificationResults ocr : 15.6 ± 0.4 Fusion Text Visual Bow + ocr : 39.0 ± 2.6 Bow : 32.9 ± 1.7 Discount House Visual #269 #431 #584 #2752 Text #1 #4 #5 #8 Proposed #1 #4 #5 #8
Conclusion • Background removal is a suitable approach for scene text detection • A new text detection method, using background connectivity and, color, contrast and objectness cues is proposed. • Improvedperformance to scene text recognition. • Improved Fine-Grained Object Classification performance with visual and scene text information fusion.
DEMO TRY HERE