Detection and Extraction of Artificial Text for Semantic Indexing

Detection and Extraction of Artificial Text for Semantic Indexing Christian Wolf and Jean-Michel Jolion Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621 Villeurbanne cedex, France January 9th 2002 Dagstuhl Seminar on Content-Based Image and Video Retrieval This presentation can be downloaded from: http://rfv.insa-lyon.fr/~wolf/presentations

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Plan of the presentation Slides: • Introduction • Detection and tracking • Enhancement and binarization of the text boxes • Experiments and results • Open problems • Conclusion and Outlook 6 3 4 2 9 1 25 This work resulted in a patent submitted by France Télécom on May 23th, 2001 under the reference FR 01 06776.

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Content based image retrieval Result Example image Similarity Function Indexing phase

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Similarity measures similar similar Not similar

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Indexing using Text Result Key word Keyword based Search Patrick Mayhew Indexing phase Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel ... ... ... ... ...

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Video properties 80 px 12 px 8 px

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Text extraction: general scheme Image enhancement - Multiple frame integration Detection of the text in single frames Tracking Video "EVENEMENT" "ACTU" "SPELEOS" "Gouffre Berger (Isére)" "aujourd'hui" "France 3 Alpes" "un spéléologue sauveteur" Segmentation/Binarisation OCR

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Text detection by accumulation of horizontal gradients (LeBourgeois, 1997). Justification: Text forms a regular texture containing vertical edges which are aligned horizontally. Post processing by mathematical morphology.

Text occurrences Frame nr. (time) Tracking - keeping track of text occurrences Suppression of false alarms Image Enhancement - Multiple frame integration Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Detection in video sequences Detection per single frame List of rectangles per frame

Integration of multiple frames to create a single image of higher quality. Super-resolution (interpolation) M1 M2 M4 M3 An additional weight is included into the interpolation scheme, which decreases the weights of temporal outlier pixels. Multiple frame integration: Averaging Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Image enhancement

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Binarization Niblack: Sauvola et al.: m mean of the window s standard deviation of the window k parameter R dynamics of the gray values of the image Contrast in the center of the image The contrast of the window The maximum local contrast M minimum gray value of the image

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Binarization methods: examples Original image Fisher Fisher (windowed) Yanowitz B. Niblack Sauvola et al. Our method

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Binarization using a priori knowledge Bayesian MAP estimation using prior knowledge on the spatial relationships in the image, modeled as a Markov random field. (In collaboration with David Doermann from the Language and Media Processing Laboratory of the University of Maryland)

5 different MPEG 1 videos of resolution 384x288. 62 minutes 93000 frames 413 text appearances Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion

Detection results OCR Results, classified by binarization method True pos. False pos. True neg. False neg. Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Detection and OCR results

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Open questions • Scene text (general orientations, deformations) • Moving text

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion What is scene text? Frames containing scene text Frames containing artificial text Video frames We do not have enough information about the importance of text in the destination domain. How many frames do contain text and scene text?

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Detection:From artificial text to scene text • Several constraints have to be removed passing from artificial text to scene text: • The constraints on temporal stability need to be abandoned or at least softened (no initial frame integration) • Text can be aligned in all orientations (Creation of an oriented feature in multiple directions, similar to invariant features) • Contrast is possibly lower because scene text is not designed to be read easily (Is detection of unreadable text necessary?).

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Text models Simple Modelssets of edges or vertical strokes... Complex Modelstemplates, probabilistic models (MRF)... • Powerful less false alarms • Do not generalize well • Generalize well, respond to many kinds of text • Many false alarms Main problem: Distinction between characters and structures similar to text according to the chosen model.  Assumptions are necessary (on the font, size, style, contrast, color, length, etc.) but not sufficient.

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Sven Dickinson: evolution of models

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion What is text? Whatever model we choose, we cannot detect/recognize all kinds of text without solving the general image understanding problem. The best thing we can do is to include richer features into the detection process: a composite model for text. • Structural analysis (e.g. detection and recognition of characters by strokes). Very hard and very unlikely to work in the case of noisy images, low resolutions and difficult fonts. • Statistical modeling of text features (e.g. by learning techniques). Problem: For a robust detection high neighborhood sizes are needed, which lead to combinatorial explosions. E.g.: Texture based methods for small text and segmentation + perceptual grouping, structural methods for big text.

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Learning techniques: pro et contra Bibliography: • Learning directly the gray levels of the input image (Jung 2001) • Learning features, i.e. coefficients of the Haar wavelet (Li and Doermann 2000) or edge strength (Lienhart 2000) • Learning is an easy way to handle the complexity of text. • Text can appear in videos in many different fonts, sizes, styles, colors, orientations etc. Learning all different forms is maybe not feasible.

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Color processing for detection? Original image Sobel on grayscale image Sobel on L*u*v* image • Saturating distance or non saturating distance? • Reflection processing?

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Tracking of moving scene text Do we detect the text in single frames (like artificial text), or do we treat the flow in its integrality? • Single frames: Multiple frame integration of moving text needs robust registration of the text boxes in different frames (e.g. rough segmentation into text and background pixels before the registration of the text pixels only) . Robust methods, which are able to track objects in clutter, are needed. • Detection of moving objects, e.g. by optical flow, spatio-temporal methods. • Mosaicing techniques can be employed for image enhancement.

Introduction Detection Enh/Binarization Exp.Results Open problems Conclusion Conclusion and Outlook • We developed a system for detection, tracking, enhancement and binarization of artificial text in videos. • The total recognition rate for artificial text is surprisingly high, given the quality of the text, but not yet good enough for indexing purposes. • The remaining problems in text extraction seem to be typical for applications in visual information management: We went as far as we could with low level features. We can’t do the necessary step to semantic information. What is text? Possible definition: text is, what (a human or an OCR) can recognize as text. • We have to include as much a priori knowledge as possible into the process.

Detection and Extraction of Artificial Text for Semantic Indexing

Detection and Extraction of Artificial Text for Semantic Indexing

Presentation Transcript

Latent Semantic Indexing

Latent Semantic Indexing

Latent Semantic Indexing

Indexing and Retrieval Semantic Search

Tools for Text Indexing and SearchING

Lexico -semantic Patterns for Information Extraction from Text

LATENT SEMANTIC INDEXING

Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

Latent Semantic Indexing

Text Indexing

Semantic Annotation for Semantic Indexing

Multimedia and Text Indexing

Latent Semantic Indexing and Beyond

Detection and Extraction of Artificial Text from Videos

Latent Semantic Indexing

Latent Semantic Indexing

LATENT SEMANTIC INDEXING

Detection and Extraction of Artificial Text for Semantic Indexing

Semantic Relation Detection in Bioscience Text

Multimedia and Text Indexing