Text Detection in Videos: Techniques and Challenges

Text Detection in Video Min Cai 2002.3.13

Background • Video OCR: Text detection, extraction and recognition • Detection Target: Artificial text • Text detection: • Detect the region from Single frame • Refine the region by combining consecutive frames

Existing Work

Connected-component-based methods • Basic idea • Treat text as an uniform color (color level) and classify each pixel as text or non-text according to the color value. • Combine connected text-pixels into connected components. • Group collinear connected components into a text string. • Advantage • Can detect an arbitrary orientation text ---- with similar color and in a simple background. • Disadvantage • Sensitive to color variance • Lossy compression of video introduces color bleeding • Complex background

Texture Segmentation method • Basic idea • Treat text as a type of texture • Use texture segmentation algorithms to detect text • Gabor Filter • Gaussian derivatives • Advantage • Can segment text areas & graphic areas in a simple background efficiently. It is usually used in document analysis. • Disadvantage • Time-consuming • Cannot handle well a text embedded in various background.

Bottom-Up method • Basic idea • A seed region is defined as a small region with high edge density. • Grow a seed region into successively larger components until all seed regions are reached on the image. • Advantage • It is a generic method to detect a homogeneous object of various shape. That is, it can detect not only a rectangular object, but also other shapes. • Disadvantage • Sensitive to noise. • Can not handle the large range of font-size. • Sensitive to the stroke density (different language).

Top-Down method • Basic idea • Based on run-length smoothing algorithm • Analyze horizontal and vertical projection profiles • Advantage • Can detect the boundary of horizontal alignment text string quickly and correctly • Noise insensitive • Disadvantage • Cannot handle diagonal alignment text. • One pass of horizontal & vertical projection cannot handle the complex layout.

Analysis (1) • A certain contrast against background • Artificial text strings are designed to be read easily • A certain stroke density • Text strings always appear horizontally • Spatial cohesion • Characters of the same text string are of similar heights, orientation and spacing • Size constraint • Text strings have certain size restriction • A text string appears in multiple consecutive frames and the similar position.

Analysis (2)

Single Threshold

Count Edge strength 0 MIN T-local MAX Low half High half Local threshold (1) • Use a small kernel (red) to scan the whole image. • In a bigger window (gray) surrounding the kernel, calculate the local threshold corresponding to its local histogram. b. Local threshold selection a. Window move

Local threshold (2)

Text-like area recovery (1) Before recovery After recovery

Text-like area recovery (2) Before recovery After recovery

High pass filter

The first region from the array Initial: Add the whole Image to processing array Horizontal project Vertical project No Add to Processing array Yes Add to result array Can divide? Coarse-to-Fine detection • Using Top-down scheme to detect text-like areas

Detect text-like areas 1) 2) 3) 4) b. Coarse vertical projection

Refinement • Combine the neighboring text areas with similar height • Using size constraints to remove unsatisfied areas

Multi-frame analysis • Text region matching • Find all the regions corresponding to the same text • Text region enhancement • Enhance the text image quality by multi-frame integration • Repetitive text elimination • Only record the text at its first emergence.

Thank you! End

Text Detection in Videos: Techniques and Challenges

Text Detection in Videos: Techniques and Challenges

Presentation Transcript

Associating Video Frames with Text

Interactive Event Detection in Video and Audio

Parameter Free Bursty Events Detection in Text Streams

Video Shot Detection

Anomaly Detection - Traffic Video Surveillance

Scene text recognition in images and video

A Laplacian Method for Video Text Detection

People Detection in Video Stream

Unsupervised Detection of Anomalous Text

Using Webcast Text for Semantic Event Detection in Broadcast Sports Video

VMD – Video motion Detection

Video and Text Conferencing

A New Approach for Video Text Detection and Localization

Foreground Background detection from video

Add Text to Video in 2 Ways

Detection of Illicit Content in Video Streams

Detection of Spelling Errors in Swedish Clinical Text

Semantic Relation Detection in Bioscience Text

Video Gross Error Detection