250 likes | 434 Views
Recognition of Video Text Through Temporal Integration. Trung Quy Phan , Palaiahnakote Shivakumara Tong Lu and Chew Lim Tan. Introduction. Text extraction from video frames video search and retrieval. Introduction. Low resolution Complex background Unconstrained appearance.
E N D
Recognition of Video TextThrough Temporal Integration TrungQuyPhan, PalaiahnakoteShivakumaraTong Lu and Chew Lim Tan
Introduction • Text extraction from video frames video search and retrieval
Introduction • Low resolution • Complex background • Unconstrained appearance
Introduction • Low resolution • Complex background • Unconstrained appearance • Temporal information
Problem • Input • Word bounding box in a reference frame • Frame ID • Output • Binarized image • Scope • Static texts • Linearly moving texts
Approach • Tracking • Alignment • Integration • Refinement
1. Tracking • Find • [tstart, tend] text framespan • Bounding box in each frame text instance tstart … … tend tref
1. Tracking • Text descriptors
1. Tracking • Text descriptors • Stroke Width Transform-SIFT
1. Tracking • t = tref + 1, tref + 2, … • Initialize search area
1. Tracking • t = tref + 1, tref + 2, … • Initialize search area • If matchRatio ≥ 0.1 estimate new BB
1. Tracking • t = tref + 1, tref + 2, … • Initialize search area • If matchRatio ≥ 0.1 estimate new BB • Otherwise, found tend
2. Alignment • Align at pixel-level better integration
2. Alignment • Align at pixel-level better integration • Slide reference text mask over individual masks optimal alignment
2. Alignment • Align at pixel-level better integration • Slide reference text mask over individual masks optimal alignment
3. Integration • Text probability map
3. Integration • Initial binarization
4. Refinement • SWT: rounded strokes • Intensity values preserve sharp edges & holes suppress background pixels
Experiments • Moving text dataset: English + German • 250 words • 1,545 characters • Bottom to top, right to left and left to right • Static text dataset: English • 212 words • 1,389 characters
Experiments • Methods for comparison • Niblack (Single) • Min/max (Multiple) • Average-Min/max (Multiple) • Ours (Single) • Ours (Multiple)
Results on Moving Texts • Character recognition rate (CRR) • Word recognition rate (WRR)
Results on Static Texts • Multiple-frame: ~20% improvement over single-frame
Summary • A variation of SIFTfor robust tracking • Integration based onword masks • Future work: handle complex text movements