240 likes | 336 Views
Text Detection in Video. Min Cai 2002.3.13. Background. Video OCR: Text detection, extraction and recognition Detection Target: Artificial text Text detection: Detect the region from Single frame Refine the region by combining consecutive frames. Existing Work.
E N D
Text Detection in Video Min Cai 2002.3.13
Background • Video OCR: Text detection, extraction and recognition • Detection Target: Artificial text • Text detection: • Detect the region from Single frame • Refine the region by combining consecutive frames
Connected-component-based methods • Basic idea • Treat text as an uniform color (color level) and classify each pixel as text or non-text according to the color value. • Combine connected text-pixels into connected components. • Group collinear connected components into a text string. • Advantage • Can detect an arbitrary orientation text ---- with similar color and in a simple background. • Disadvantage • Sensitive to color variance • Lossy compression of video introduces color bleeding • Complex background
Texture Segmentation method • Basic idea • Treat text as a type of texture • Use texture segmentation algorithms to detect text • Gabor Filter • Gaussian derivatives • Advantage • Can segment text areas & graphic areas in a simple background efficiently. It is usually used in document analysis. • Disadvantage • Time-consuming • Cannot handle well a text embedded in various background.
Bottom-Up method • Basic idea • A seed region is defined as a small region with high edge density. • Grow a seed region into successively larger components until all seed regions are reached on the image. • Advantage • It is a generic method to detect a homogeneous object of various shape. That is, it can detect not only a rectangular object, but also other shapes. • Disadvantage • Sensitive to noise. • Can not handle the large range of font-size. • Sensitive to the stroke density (different language).
Top-Down method • Basic idea • Based on run-length smoothing algorithm • Analyze horizontal and vertical projection profiles • Advantage • Can detect the boundary of horizontal alignment text string quickly and correctly • Noise insensitive • Disadvantage • Cannot handle diagonal alignment text. • One pass of horizontal & vertical projection cannot handle the complex layout.
Analysis (1) • A certain contrast against background • Artificial text strings are designed to be read easily • A certain stroke density • Text strings always appear horizontally • Spatial cohesion • Characters of the same text string are of similar heights, orientation and spacing • Size constraint • Text strings have certain size restriction • A text string appears in multiple consecutive frames and the similar position.
Count Edge strength 0 MIN T-local MAX Low half High half Local threshold (1) • Use a small kernel (red) to scan the whole image. • In a bigger window (gray) surrounding the kernel, calculate the local threshold corresponding to its local histogram. b. Local threshold selection a. Window move
Text-like area recovery (1) Before recovery After recovery
Text-like area recovery (2) Before recovery After recovery
The first region from the array Initial: Add the whole Image to processing array Horizontal project Vertical project No Add to Processing array Yes Add to result array Can divide? Coarse-to-Fine detection • Using Top-down scheme to detect text-like areas
Detect text-like areas 1) 2) 3) 4) b. Coarse vertical projection
Refinement • Combine the neighboring text areas with similar height • Using size constraints to remove unsatisfied areas
Multi-frame analysis • Text region matching • Find all the regions corresponding to the same text • Text region enhancement • Enhance the text image quality by multi-frame integration • Repetitive text elimination • Only record the text at its first emergence.
Thank you! End