Superresolution of Texts from Nonideal Video

Superresolution of Texts from Nonideal Video Xin Li Lane Dept. of CSEE West Virginia University Morgantown, WV 26506-6109 This work is partially supported by NASA WV EPSCoR Award 2005-2006

Outline • Introduction • What is SR? Why SR? How to achieve SR? • A general framework for SR: registration + restoration • Understand the boundary of formulating SR as an inverse problem • SR of texts from nonideal video • Problem statement: why texts and nonideal video? • Analyze error accumulation in multiframe registration • Address the issue of quality/PSF consistency in restoration • Experimental Results • Conclusions

Image Resolution W H    Gonzalez “Digital Image Processing” Chip size  Field-Of-View: HW Pixel size  Sampling Distance 

Why Higher Resolution? • Improved objective fidelity • Natural scene is seldom band-limited • Higher resolution implies smaller representation errors • Improved subjective quality • Attention enhances spatial resolution • Spatial resolution enhances attention? • Improved measuration/recognition • Law enforcement, forensics/biometrics: face recognition grand challenge (FRGC), iris recognition, vehicle license plate recognition

Towards Gigapixel: Artistic Approach Mega-pel Giga-pel Photographers and artists have manually or semi-automatically stitched hundreds of mega-pel pictures together to demonstrate how a giga-pel picture looks like  the power of pixels http://triton.tpd.tno.nl/gigazoom/Delft2.htm

Scientific Solutions • Sensor-based • Reduce pixel size: limit – 0.40m2for a 0.35m CMOS process • Increase chip size: ineffective due to increased capacitance (bad for speeding up a charge transfer rate) • Computational (Super-resolution) • Exploit the tradeoff between space and time: obtain a HR from multiple LR copies • Physical principles of imaging plays the fundamental role in defining the relationship between LR and HR • Hybrid: the convergence of the camera and the computer • Computational cameras: catadioptric camera, jitter camera (Ben-Ezra, Zomet and Nayar)

SR: A General Framework S.C. Park et al., “Super-resolution image reconstruction: a technical overview”, IEEE Signal Processing Magazine, pp. 21-36, May 2003 SR can be formulated as an inverse problem, assuming a mathematical model linking LR to HR images is known

SR: At the Intersection of SP and CV • Registration problem • Translational models • Subpixel accuracy phase correlation (Foroosh, Zerubia and Berthod’1996) • Subspace methods in the frequency domain (Vandewallea, Sbaiza, S̈usstrunka and Vetterli) • Projective models or planar homography (Capel and Zisserman’2003) • Images of a planar surface under arbitrary camera motion or images of a scene under fixed camera • Restoration problem • Model-based: regularized deblurring, robust SR (Farsiu, Elad and Milanfar’2004) • Learning-based: exemplar-based SR (Freeman, Jones and Pasztor’2002), video epitome (Cheung, Frey and Jojic’2005)

Understand the Boundary of SR as an Inverse Problem • Limited modeling capability • Fixed enhancement ratio specified by the down-sampling operation • We formulate scalable (progressive) SR: as more data become available, higher resolution can be achieved • Inevitable approximation when warping gets complex • We advocate nonuniform interpolation based forward approach in the case of arbitrary camera motion • Sensor PSF is often unknown and time-varying • We propose to adaptively select a subset of LR images

Outline • Introduction • What is SR? Why SR? How to achieve SR? • A general framework for SR: registration + restoration • Understand the boundary of formulating SR as an inverse problem • SR of texts from nonideal video • Problem statement: why texts and nonideal video? • Analysis of error accumulation in multiframe registration • Issue of phase/PSF consistency in restoration : NOT all LR images are useful • Experimental Results • Conclusions

SR-of-Texts from Nonideal Video HR image of license plate SR Problem Statement Given a segment of video clip that contains some texts that are illegible due to the limited resolution, how to produce a HR image in which the texts become clearly readable (by human)?

Defining the Boundary of Problem • Why texts? • Texts represent an important class of visual information (e.g., law enforcement applications) • Relatively easy assessment of SR results by human observers • Texts are often printed to a planarsurface, which facilitates the registration • What do we mean by nonideal video? • Uncontrolled real-world acquisition conditions: handheld camera (arbitrary camera motion), unfavorable illumination, unknown PSF, inevitable compression artifacts, and so on

Our Practical Approach Consistency-guided Preprocessing Not all LR images are used in our SR scheme Homography-based Registration Accuracy is guaranteed by planar surface assumption Nonuniform Interpolation Search for an appropriate magnifying ratio and phase Diffusion-aided Blind Deconvolution Tailored for bimodal textual images

LR Image Consistency Quality consistency PSF consistency Human vision helps the selection of consistent LR images

Homography-based Multiframe Registration Sequential image 2 image K image 1 Parallel image 1 image K image 2 or Homography matrix Mosaicing: slightly-overlapped FOV  sequential Superresolution: severely-overlapped FOV  parallel

Nonuniform Interpolation distance of HR lattice phase of HR lattice Data grid : Fused data points from registered LR images Lattice : targeted data points at HR Target HR lattice: min d(, ) over two parameters: distance and phase

Experimental Results (I): SR Comparison on Benchmark Data Input: 20 LR images Before deblurring … … After deblurring Thanks to Prof. Milanfar for providing us the UCSC-SR software UCSC-SR Ours

Experimental Results (II): SR Results Comparison on Nonideal Video Input: 4 LR images UCSC-SR Ours

Experimental Results (II): SR Results Comparison on Nonideal Video Input: 4 LR images Ours UCSC-SR After deblurring

Experimental Results (III):Impact of Error Accumulation K=4 parallel sequential K=8 parallel sequential Error accumulation in sequential registration degrades image quality when K is large

Conclusions and Perspectives • SR of texts from nonideal video • A class of SR problems whose boundary can be well defined • An example supporting a practical, forward approach towards SR • To have a better understanding of SR techniques • We need to look at the problem from a perceptual perspective • New applications such as video compression, distributed coding, iris recognition, biomedical imaging will help us define the boundary of SR • Spatial vs. temporal SR: fundamental space-time tradeoff

Superresolution of Texts from Nonideal Video