A New Approach for Video Text Detection and Localization

A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong

Related work • Text Area Detection • Uncompressed domain methods • Texture-based • Color-based • Edge-based • Compressed domain methods • DCT coefficients • Number of intra-coded blocks on P- / B- frames • Text String Localization • Bottom-up scheme • Top-down scheme

Language-independent characteristics • Contrast • An adaptive contrast threshold according to the background complexity • Color • Color bleeding caused by compression • Orientation • Well-defined size and orientation make it easy to understand • Stationary location • Appear a certain long time

Language-dependent characteristics

Sampling & color space conversion Video text detection and localization on every sampled frame Multi-frame comparison Workflow

Original image Edge map Text regions Text area Detection Text string Localization Edge detection Size/ f(l) Size f(l) Original coordinates of text regions Level = 1 Level = 2 Level = n-1 Text area Detection Text string Localization Edge map Text regions Size/ f(l) Size f(l) Original coordinates of text regions Level = n Final text regions with original coordinates A sequential multi-resolution paradigm

Text detection • Edge detection • Sobel edge detector • Local thresholding • Adaptive to background complexity • Text-like area recovery • Enhance the density of text areas

Count Low part High part Edge strength 0 MAX (a) Concentric kernel and window (b) A window on the multi-line text area and the horizontal projection in it. (c) Local threshold selection Kernel P3h . . . . 3h h Window P1 Local Thresholding • Use a small kernel (gray) to scan the whole edge map row by row. • In the bigger window surrounding the kernel, check the background type: “Clear” or “Noisy”. • For Clear background and Noisy background, determined the local threshold by low and high parts, respectively, of the edge strength histogram in the bigger window.

Video image Global thresholding results Local thresholding results Thresholding result comparison

Before recoveryAfter recovery Text-like area recovery • Labeling: Classify current edge pixels as “TEXT” and “NON_TEXT” based on its local density. • Recovery/Suppression: • Bring back neighboring lower-strength edge pixels of the TEXT edge pixels. • The NON_TEXT edge pixels are suppressed.

Sub-regions Add to the processing array Y Pop the first region from the processing array Each sub-region Y Horizontal projection Vertical projection Divisible? Divisible? N N Indivisible regions Initialization The whole edge map is the only region in the processing array. The region If the array is empty, terminate. Check aspect ratio N Y Discard false regions Add to the resulting text regions Coarse-to-fine Text localization • Projection-based top-down localization. • To handle complex text layout.

(1) (2) (3) (4) Localization steps

Experimental results

Performance statistics Statistics of 10 news videos: • Processing time per frame: 0.25 s (PIII 1GCPU) • Detection rate = = 93.6% • Detection accuracy = = 87.2% • Localization accuracy = > 90%

A New Approach for Video Text Detection and Localization

A New Approach for Video Text Detection and Localization

Presentation Transcript

A New Approach for New Business?

A New Approach for Classification :

Particle Filters for Localization Abnormality Detection

Accurate Video Localization

Pedestrian Detection and Localization

Accurate Video Localization

Pedestrian Detection and Localization

A Laplacian Method for Video Text Detection

A New Approach to Unsupervised Text Summarization

Using Webcast Text for Semantic Event Detection in Broadcast Sports Video

A New Approach for Overlay Text Detection and Extraction From Complex Video Scene

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems

Point Source Detection and Localization

Automatic Detection of Social Tag Spams Using a Text Mining Approach

Using Lane Detection for Vehicle Localization

Video and Text Conferencing

Text Classification and Named Entities for New Event Detection

Multiple Audio Sources Detection and Localization

Accurate Video Localization

Need for a New Approach

A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots