430 likes | 443 Views
This research paper focuses on the detection of text in images from a video stream in order to enable semantic indexing for efficient retrieval. The paper includes a detailed evaluation of different text detection algorithms and their effectiveness in various scenarios.
E N D
Détection des textes dans les images issues d’un flux vidéo pour l´indexation sémantique 3 juin 2004 Christian Wolf Christian.wolf@liris.cnrs.fr http://rfv.insa-lyon.fr/~wolf Laboratoire d'Informatique en Images et Systèmes d'information LIRIS, FRE 2672 CNRS Bât. Jules Verne, INSA de Lyon 69621 Villeurbanne cedex
Plan Features Introduction Evaluation/ Choice of features Text detection Experimental Results Conclusion Introduction Features Evaluation Text detection Results Conclusion
Result Key word keyword-based Search Patrick Mayhew Indexing phase Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel ... ... Introduction Features Evaluation Text detection Results Conclusion Image/video indexing • Content based image retrieval (Master’s degree): • Query by example: • Indexing based on local texture (Gabor) features • Video indexing using semantic descriptors (PhD) : • Text detection, enhancement, segmentation and recognition.
Detection Enhancement Segmentation “Soukaina Oufkir” Introduction Features Evaluation Text detection Results Conclusion Text detection
Heuristics Separate populations (discriminant analysis) Learning a model (SVM, etc.) Reinforcement learning Master’s thesis of Graham Taylor Introduction Features Evaluation Text detection Results Conclusion Detection in an image • Problems: • Which features? • How can the decision be taken (text - non-text)? Contrast and Edge features Geometrical features Corner features Texture features Color features Region/stroke segmentation
Introduction Features Evaluation Text detection Results Conclusion Plan Features Introduction Evaluation/ Choice of features Text detection Experimental Results Conclusion
Introduction Features Evaluation Text detection Results Conclusion Videos vs. scanned documents Temporal aspects Complex and moving background Artificial shadows
Introduction Features Evaluation Text detection Results Conclusion Videos vs. scanned documents • Low resolution • Low quality • Antialising artifacts • Compression artifacts • Color bleeding
Introduction Features Evaluation Text detection Results Conclusion What is text? - character segmentation Scene text Artificial text
Introduction Features Evaluation Text detection Results Conclusion What is text? - texture Original image Filter tuned to the example text Example: Gabor energy features on a text image Gabor energy Thresholded Gabor energy
Introduction Still images Videos Character segmentation Results Indexing What is text? - texture
Derivative 2nd derivative smeared Introduction Features Evaluation Text detection Results Conclusion What is text? - corners Unthresholded “Harris” corner response
Introduction Features Evaluation Text detection Results Conclusion What is text? - contrast & geometry Example image Accumulated horizontal Sobel edges
Sobel on grayscale image Modified Sobel on L*u*v* image Introduction Features Evaluation Text detection Results Conclusion What is text? - color • Special cases of text: • Small contrast in the lumination plane • High(er) contrast in the color plane Original image
Introduction Features Evaluation Text detection Results Conclusion Plan Features Introduction Evaluation/ Choice of features Text detection Experimental Results Conclusion
Introduction Features Evaluation Text detection Results Conclusion Evaluation • A good evaluation algorithm permits: • A simple and intuitive interpretation of the obtained performance • An objective comparison between the different algorithms to evaluate • A good correspondence between the performance measures and the real performance, taking into account the objective of the algorithm (goal oriented approach) • Takes into account only the performance of the algorithm, without side effects of other processing steps
Introduction Features Evaluation Text detection Results Conclusion Evaluation at different levels Lower influence of later stages Lower computational complexity Detection result Ground truth Statistical separation: Bhattacharyya distance Error rate, Recall/Precision on pixel level Recall/Precision on rectangle level Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel ... ... Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel ... ... Goal oriented: Recall/Precision on character level Higher relevance to the application
Detection Ground truth Introduction Features Evaluation Text detection Results Conclusion Evaluation on rectangle level • Pure overlap is ambiguous on multiple images: • 50% of recall could mean: • 50% of the text rectangles have been detected perfectly • 100% of the rectangles have been detected with 50% surface • Anything between the two ...
Introduction Features Evaluation Text detection Results Conclusion Evaluation on rectangle level • Requirements of an evaluation measure: • Tells intuitively how many rectangles have been detected, and how many false alarms • Measures the detection quality • Takes into account one-2-one, one-2-many and many-2-one matches • Scales up to multiple images Problem: Counts number of correctly detected rectangles Measures the detection quality Contradiction
Ground truth Gi Detection Di “Surface” Recall and Precision: Thresholded by different thresholds on recall and precision Introduction Features Evaluation Text detection Results Conclusion Performance graphs For each rectangle, we will know whether it has been detected or not, depending on a quality threshold
Introduction Features Evaluation Text detection Results Conclusion Performance graphs Threshold on surface recall Threshold on surface precision
Introduction Features Evaluation Text detection Results Conclusion Comparison of different detection algorithms Method 1: Local contrast Method 2: SVM Learning
Introduction Features Evaluation Text detection Results Conclusion The influence of the test database Local contrast SVM learning
Introduction Features Evaluation Text detection Results Conclusion Plan Features Introduction Evaluation/ Choice of features Text detection Experimental Results Conclusion
Introduction Still images Videos Character segmentation Results Conclusion The local contrast method Calculate a text probability image according to a text model (1 value/ pixel) F. LeBourgeois Separate the probability values into 2 classes. Fisher/Otsu Post processing • Mathematical morphology • Geometrical constraints • Verification of special cases • Combination of rectangles
Introduction Features Evaluation Text detection Results Conclusion The learning method Learning gray values and edge maps alone may not generalize enough. Texture alone is not reliable, especially if the text is short. Geometry is a valuable feature. State of the art: enforce geometrical constraints in the post-processing step (mathematical morphology) We propose the usage of geometrical features very early in the detection process, i.e. not during post-processing.
Introduction Features Evaluation Text detection Results Conclusion Geometrical features: baseline • Text consists of: • A high density of strokes in direction of the text baseline. • A consistent baseline (a rectangular region with an upper and lower border). • Two detection philosophies: • Detection of the baseline directly before detecting the text region. • Detection of the baseline as the boundary area of the detected text region in order to refine the detection quality.
Introduction Features Evaluation Text detection Results Conclusion Estimation of the text rectangle height Original image Accumulated gradients
Introduction Features Evaluation Text detection Results Conclusion Features Mode width (=rectangle height) Mode height (=Contrast) Difference height left-right Mode mean Mode standard deviation Difference in mode width
Training image database Introduction Features Evaluation Text detection Results Conclusion Learning with Support Vector Machines positive samples negative samples Bootstrapping, cross-validation • Classification step: a reduction of the computational complexity is necessary: • Sub-sampling of the pixels to classify (4x4) • Approximation of the SVM model by SVM-regression.
Introduction Features Evaluation Text detection Results Conclusion Plan Features Introduction Evaluation/ Choice of features Text detection Experimental Results Conclusion
Introduction Features Evaluation Text detection Results Conclusion AIM2 Commercials AIM3 News AIM4 Cartoons, News AIM5 News
Introduction Features Evaluation Text detection Results Conclusion Detection in still images Local contrast SVM learning
Introduction Features Evaluation Text detection Results Conclusion Local contrast SVM learning
Introduction Features Evaluation Text detection Results Conclusion Local contrast SVM learning
Introduction Features Evaluation Text detection Results Conclusion Detection in video sequences
Introduction Features Evaluation Text detection Results Conclusion Character segmentation: examples Original image Fisher/Otsu Fisher/Otsu (windowed) Yanowitz-B. Yanowitz-B. +post-proc. Niblack Sauvola et al. Contrast maximiz.
Bayesian estimation using a Markov random field prior Sauvola et al. MRF Introduction Features Evaluation Text detection Results Conclusion OCR results Local contrast based binarization Recognition by Abby Finereader 5.0
“Oil” “Air plane” “Airline” “Dance” Introduction Features Evaluation Text detection Results Conclusion TREC 2002 “Music” Collaboration with Laboratory LAMP, University of Maryland “Energy Gas”
Introduction Features Evaluation Text detection Results Conclusion Conclusion • The choice of features is primordial in vision. • We propose a new evaluation method which allows intuitive visualization of the detection quality by performance graphs. • We developed a new system for detection, tracking, enhancement and binarisation of text. • Detection performance is high due to the integration of several types of features in a very early stage. The learning method is less sensitive to textured noise in the image.
Introduction Features Evaluation Text detection Results Conclusion Outlook • Possible improvement of the features (e.g. contrast normalization, non-linear texture filters). • Integration of different feature types (statistical, structural, ...) • Usage of a priori knowledge on text in order to decrease the number of false alarms • Integration of the detected text into a indexing/browsing/segmentation framework