460 likes | 643 Views
Detection and Extraction of Artificial Text from Videos. Christian Wolf and Jean-Michel Jolion. 10 th July 2001. PROJECT France Télécom Research & Development 001B575. Laboratoire de Reconnaissance de Formes et Vision Bât. Jules Verne INSA 69621 Villeurbanne CEDEX.
E N D
Detection and Extraction of Artificial Text from Videos Christian Wolf and Jean-Michel Jolion 10th July 2001 PROJECT France Télécom Research & Development 001B575 Laboratoire de Reconnaissance de Formes et Vision Bât. Jules Verne INSA 69621 Villeurbanne CEDEX http://rfv.insa-lyon.fr/~{wolf,jolion}
Intro Detection Enhancement Binarisation Experiments Results Plan of the presentation Slides: • Introduction • Detection • Image enhancement - multiple frame integration • Binarisation of the text boxes • Setup of the experiments • Results • Detection • Binarisation • OCR • Conclusion and outlook 6 8 3 10 11 6 2 46
Intro Detection Enhancement Binarisation Experiments Results Content based image retrieval Result Example image Similarity Function Indexing phase
Intro Detection Enhancement Binarisation Experiments Results Similarity measures similar similar Not similar
Intro Detection Enhancement Binarisation Experiments Results Indexing using Text Result Key word Keyword based Search Patrick Mayhew Indexing phase Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel ... ... ... ... ...
80 px 12 px 8 px Intro Detection Enhancement Binarisation Experiments Results Video properties
Intro Detection Enhancement Binarisation Experiments Results Text extraction: general scheme Image enhancement - Multiple frame integration Detection of the text in single frames Tracking Video "EVENEMENT" "ACTU" "SPELEOS" "Gouffre Berger (Isére)" "aujourd'hui" "France 3 Alpes" "un spéléologue sauveteur" Segmentation/Binarisation OCR
Intro Detection Enhancement Binarisation Experiments Results Detection in single frames Video Connected components Analysis Verification of geometric constraints Calculation of the gradient Accumulation Verification of special cases Binarisation Combination of the rectangles Mathematical Morphology List of rectangles
Intro Detection Enhancement Binarisation Experiments Results Detection in single frames: examples
W M-W Intro Detection Enhancement Binarisation Experiments Results A filter for text detection Accumulation of horizontal gradients. Justification: Text forms a regular texture containing vertical edges which are aligned horizontally.
Close Deletion of small bridges between the components dilate (special) to connect characters erode (special) to connect characters erode horizontally Intro Detection Enhancement Binarisation Experiments Results dilate horizontally Mathematical morphology
Text occurrences Frame nr. (time) Tracking - keeping track of text occurrences Suppression of false alarms Image Enhancement - Multiple frame integration Intro Detection Enhancement Binarisation Experiments Results Detection in video sequences Detection per single frame List of rectangles per frame
Frame nr. (time) List of rectangles detected for the current frame Text occurrences List containing the most recent rectangle of each text occurrence The integration is done using overlap information (overlap matrix) Intro Detection Enhancement Binarisation Experiments Results Integration of the rectangles occurrences At every new frame, the detected rectangles must be matched with the stored text occurrences
Intro Detection Enhancement Binarisation Experiments Results Suppression of false alarms: Examples All detections After suppression of false alarms
Integration of multiple frames to create a single image of higher quality. Super-resolution (interpolation) M1 M2 M4 M3 Robust bi-linear Robust bi-cubic An additional weight is included into the interp.scheme: Fi ith image M Mean image V Std.deviation image Multiple frame integration: Averaging Intro Detection Enhancement Binarisation Experiments Results Image enhancement
Intro Detection Enhancement Binarisation Experiments Results Interpolation: Examples Bi-linear interpolation Robust bi-linear interpolation Robust bi-cubic interpolation
Intro Detection Enhancement Binarisation Experiments Results Interpolation: thresholded examples Bi-linear interpolation Robust bi-linear interpolation Robust bi-cubic interpolation
Intro Detection Enhancement Binarisation Experiments Results Binarisation • Different Binarisation algorithms have been implemented and evaluated: • Fisher/Otsu and windowed Fisher/Otsu algorithm • Yanowitz-Bruckstein • Niblack, Sauvola • Our adaptive version of Niblack/Sauvola´s method.
Intro Detection Enhancement Binarisation Experiments Results Binarisation methods Yanowitz Bruckstein: The threshold surface is calculated from the edge information. Threshold surface Windowed-Fisher, Niblack-Sauvola: The threshold surface is calculated from the statistics collected in a window which is shifted across the image. Threshold surface
Intro Detection Enhancement Binarisation Experiments Results Binarisation by Niblack Niblack proposed a method which calculates a threshold surface by gliding a rectangular window over the image and calculating statistics on this window: m mean s standard deviation k parameter, = -0.2
Intro Detection Enhancement Binarisation Experiments Results Binarisation by Niblack: Problems Problems are light textures in the background, which are considered as text with small contrast:
Reformulation shows, that a hypothesis on the gray values of text and non-text are used to remove the noise produced by background textures: Intro Detection Enhancement Binarisation Experiments Results Binarisation: Improvement by Sauvola To overcome these problems, Sauvola et al. proposed a new improved formula to calculate the threshold: m mean s standard deviation k parameter, = 0.5 R parameter (dynamic range of std.dev.), R = 128
Binarised using Niblack´s method Binarised using Sauvola et al.´s method Intro Detection Enhancement Binarisation Experiments Results Binarisation by Sauvola, examples Original image
Nib Sauv. R=128 R ad. Intro Detection Enhancement Binarisation Experiments Results Improvement: Adaptive dynamic range Fixing the dynamic range R=128 might be ok for document images, but not for text boxes taken from videos. Binarisation will not be correct, if the contrast of the image is smaller. We therefore set the parameter R to the maximum standard deviation for all windows calculated: To avoid two passes of the windowing algorithm, the mean and standard deviation can be stored in a table during the first pass and the threshold surface calculated on this data.
Niblack Sauvola R=128 R ad. Intro Detection Enhancement Binarisation Experiments Results Improvement: Shift of the image range The strong hypothesis on the gray values (text pixels must be near zero) is not justified for some video text boxes: Gray value histogram
Corrected image binarised, R adaptive The same effect can also be achieved by changing the threshold formula: m mean s standard deviation k parameter, = 0.5 R = maximum of the std.dev. of all windows M = minimum gray value of the text box Intro Detection Enhancement Binarisation Experiments Results Improvement: Shift of the image range A correction of the image´s histogram resolves this problem: Original image
At the beginning of each line, the full window is calculated and the variables a and b kept. After each shift, a and b are calculated incrementally by subtracting the column of pixels which left the window and adding the column which entered the window. Mean and standard deviation are stored in 2d tables, then the maximum R=max(s) is computed before calculating the threshold surface L R Intro Detection Enhancement Binarisation Experiments Results Fast incremental calculation Mean and variance can be calculated in one pass:
Intro Detection Enhancement Binarisation Experiments Results The experiments Description of the experiments • The videos used in the experiments. • Description of the evaluation process (OCR Evaluation). Results for: • Text detection • Binarisation • OCR
Intro Detection Enhancement Binarisation Experiments Results Test videos We performed experiments on 5 different MPEG 1 videos of resolution 384x288:
Intro Detection Enhancement Binarisation Experiments Results AIM2 Commercials AIM3 News AIM4 Cartoon, News AIM5 News
Intro Detection Enhancement Binarisation Experiments Results Video example - France Télécom ~22 minutes of video ~33000 frames
A4 page Intro Detection Enhancement Binarisation Experiments Results The interface to the OCR software Ideal situation: Pass individual (binarised) text boxes to an OCR software which recognises the contents box after box. In reality: We used standard commercial OCR software for our tests. This software has been designed to recognise scanned A4 or US letter pages and cannot directly process text boxes.
Intro Detection Enhancement Binarisation Experiments Results OCR Page - Manual An input image, ready for the OCR
Intro Detection Enhancement Binarisation Experiments Results OCR Output 051Q07Ô7 N*Verf 05JQ0707 PUBLICITE IPUBIIÏITE IPUBLICITE prenez prenez prenez boyard boyard boyard ^française ^française ^française FRANCE FRANCE FRANCE FRANCE FRANCE c'est plus musclé c'est plus musclé iï 'J fort fort fort fort fort .fort .fort .fort cotHfUet blé cotHfUet blé cQ#tfUet blé uutàfruuk On va beaucoup {&*$ loin avec Itineris. Partout Partout Partout Partout Partout Partout Partout Partout Partout Partout I22h35 I22h35 I22h35 I22h35 I22h35 PUBLICITE \PUBLICITE \PUBLICITE >3h55l23h55l23h55l23h55l23h55l23h55 20h.50120h50 |20h50120h50 |20h50120h50 ,f ort boyard ,f ort boyard ,f ort boyard ,f ort boyard 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J II II II II II II II II II gà dents gà dents gà dents IIH r Lessive classique lljir Lessive classique I[HT Lessive classique le temps le temps le temps le temps le temps ^PUBLICITE ^PUBLICITE ^PUBLICITE I Par Amour du Goût. Il Par Amour du Goût. I en en en en en en en en en révolution révolution révolution
Intro Detection Enhancement Binarisation Experiments Results Post processing of OCR output Post processed OCR output Ground truth dimanche 23h55 N Vert 05100707 Berlingo PUBLICITE prenez diffusion simultanée en stéréo sur boyard française FRANCE c'est plus musclé PUBLICITE fort Coral blé complet fruits On va beaucoup Plus loin avec Itineris. Bohême Partout 22h35 PUBLICITE 23h55 20h50 fort fort boyard 23h55 051Q07Ô7 PUBLICITE prenez boyard ^française FRANCE c'est plus musclé fort blé cotHfUet uutàfruuk On va beaucoup {&*$ loin avec Itineris. Partout I22h35 PUBLICITE \ >3h55l 20h.50 ,f ort boyard
Intro Detection Enhancement Binarisation Experiments Results Automatic evaluation using markers The manual processing of the OCR output (separation of the output strings and search of the corresponding input box) is time consuming and error prone, especially in cases where the quality of the OCR output is very poor. Automatic OCR output processing can be achieved by placing marker images between the text boxes. The marker boxes contain text which is easily recognised by the OCR software. In the results section we will present results for both types of evaluation.
Intro Detection Enhancement Binarisation Experiments Results An input image with markers, ready for the OCR
Structure log # Page 1: P 1 T 1 2 M 1 2 T 2 3 M 2 2 T 3 2 Search output for individual text boxes Prepare ground truth List of strings, each corres-ponding to the output for a text box, but eventually multiple times List of strings, each corresponding to the ground truth for a text box. Each string is repeated the same number of times as the corresponding text image in the OCR input image Evaluation Transformation cost Recall Precision Intro Detection Enhancement Binarisation Experiments Results OCR Evaluation OCR output Raw ground truth Tkenchar 037 Tkenchar 037 'gfrançaise 'gfrançaise 'gfrançaise 'gfrançaise Tkenchar 038 Tkenchar 038 Mpe pire de| fj^e pire de| fj^e pire de| Tkenchar 039 Tkenchar 039 @S Par Amour du Goût. @S en @S révolution @S la @S française @S le pire de @S 20H45
AirbagGtroônn Airbag Citroën Intro Detection Enhancement Binarisation Experiments Results OCR Evaluation: Wagner & Fischer A measure for resemblance of two character strings. The cost to transform string A into string B is calculated. Basic transformation operations are used, which correspond to a certain cost. The cost function is minimised. Substitution: cost Insertion: cost Deletion: cost
Intro Detection Enhancement Binarisation Experiments Results Detection results - INA Videos No suppression of false alarms
Intro Detection Enhancement Binarisation Experiments Results Binarisation methods: Examples Original image Fisher Fisher (windowed) Yanowitz B. Yanowitz B. + PP Niblack Sauvola et al. Our method
Intro Detection Enhancement Binarisation Experiments Results Binarisation methods: Examples Original image Fisher Fisher (windowed) Yanowitz B. Yanowitz B. + PP Niblack Sauvola et al. Our method
Intro Detection Enhancement Binarisation Experiments Results OCR Results - Classification by binarisation method Robust bi-cubic interpolation Results obtained using the manual evaluation method (no markers in the input page). 44 pages
Robust bi-linear interpolation Robust bi-cubic interpolation Intro Detection Enhancement Binarisation Experiments Results OCR Results: Interpolation methods Results obtained using the automatic evaluation method (including markers in the input page). Robust bi-cubic interpolation 97 pages
Intro Detection Enhancement Binarisation Experiments Results Conclusion • We developed a system for detection, tracking, enhancement and binarisation of text. • A detection performance of 93.5% is obtained. • We derived a new binarisation method adapted to the type of text found in videos. • The total recognition rate is surprisingly high, given the quality of the text, but not yet good enough for indexation purposes. • OCR integration problem: No software development kits for direct access to the recognition functions available. A collaboration with an OCR company seems to be inevitable.
Intro Detection Enhancement Binarisation Experiments Results Outlook The perspectives of our work are situated in the extension of the existing algorithms to text with more difficult properties, and the enhancement and deeper studies of the existing techniques: Scene text: The binarisation techniques developed in the last 30 years are aimed either at document images or images from computer vision. The method we introduced in the framework of this project is an improvement of the work already presented, but the quality of the text is not yet satisfying enough. Especially the binarisation of scene text will demand the development of new methods. Detection recall: We are convinced, that the recall of the detection system can still be increased by further research, e.g. on the binarisation technique applied to the map of accumulated gradients.