Image, Texture, Video & Structure Completion

Image, Texture, Video & Structure Completion Nikos Paragios http://cermics.enpc.fr/~paragios MAS Ecole Centrale de Paris Paris, France Joint Work with : Cedric Allene (ESIEE/CNRS/CERTIS)

Outline • Problem Statement • Literature Review (inpainting in the form of interpolation problem) • Completion in the form of a labeling problem using MRFs • On the selection of candidate patches and on their positions • Optimization of the Cost function • (Brief) Sketch Extensions to Video and Structure • Discussion

Problem Statement • Inpainting consists of modifying a partially destroyed image towards its ancient form • In a way that makes the process non-detectable for an observer that hasn’t seen the original image

Inpainting by the Propagation of Information • According to conservators from museums, an automated inpainting process should satisfy the following conditions: • The global picture determines how to fill in the gap • The structure of the area surrounding the missing part is continued into it through the prolongation of the ones arriving at the missing part boundaries • The different regions within the missing part are filled in with color that matches the ones of the line at the boundaries that was used to fill in the information • Small details that are not part of the structure (texture) are added once the filling in procedure has been completed.

Image Inpainting through PDEs • Let us consider an image with a missing part: • Inpainting consists of creating a sequence of images: such that and • One can consider a general form of this algorithm and write a partial differential equation of the following nature: • Let us consider the information to be propagated and denote such information with as well as the propagation direction • Then a quite simple method to perform such propagation is through the following PDE:

Image Inpainting • That leads to following third order PDE

Variational Methods for Filling-In (Elastica-based Model ) • Consider an image , then one can determine the image using its upper (or lower) level set according to: • Through the following reconstruction process • That consists of separating image in iso-intensity lines (or level sets) • The Euler’s elastica model consists of defining a cost functional that given two T-junction points and and their tangents , seeks a smooth curve between the two points according to the following cost function • Where the minimum is taken along all curves joining the two points

Image Inpainting through the Euler Elastica

Joint Interpolation of Vector Fields and Gray Levels • Both, the propagation information model & level-lines Euler’s elastica approach they are mostly based on interpolation of intensities (even though some geometry is used in the case of level lines) • One can consider a more efficient approach that does not only constrain the appearance but also the geometry of the filling in part • To this end, the principle of image continuation of good continuation that refers to the joint intensity/orientation space, • Let be a vector field with the directions of the gradient for the original image that do satisfy the following condition: • Subject to the constraint:

Joint Interpolation of Vector Fields and Gray Levels

Problems with PDE approaches • Good continuation Principle is reasonable in the case of uniform images • Good continuation principle provides good results if the missing content has small volume • Convergence, as well recovering the optimal solution of the minimization of the cost function might be two problematic issues since we are living in convex spaces • Introducing metrics and content that goes beyond images is problematic, the definition of good continuation in several metrics doesn’t exist with texture, video, & structure being the examples

Completion Using Concepts from Game Theory

Defining the Problem • Inpainting consists of modifying a partially destroyed image towards its ancient form • In a way that makes the process non- detectable for an observer that hasn’t seen the original image • Without loss of generality since the information is not available at the region to be inpainted we can assume that the missing content is present elsewhere in the image • Let us consider n possible candidates to fill in the content for a particular segment (randomly selected from the image content) [tetris/lego bars].

Defining the Problem • Inpainting the consists of finding for every position the most probable configuration • “Such configuration on one hand should be in agreement with the existing content; that in the simple image completion problem can be enforced through the minimization of • While more general metrics can be used, taking into account richest image content that can also encode geometry, importance of boundaries etc.

Defining the Problem • Preserving discontinuities and contrast are important for the human eye and therefore particular attention is to be paid when introducing content in these areas • Therefore, we can modify the image term to encode such perception from biological vision systems through the penalization of errors in areas with important gradients. • Solving such an optimization problem is ill-posed, • we have pixel-wise constraints that do not encode a global coherence of the solution • Pixels will be labeled in an independent fashion based on the best match with the existing content • Missing content in areas with non-overlap with the existing content will be falsely recovered

Defining the Problem • Such Image Attraction term should be combined with a smoothness labeling term; • Since we are considering image patches to fill in the gaps, it is natural to assume that within the inpainted region, neighborhood pixels are filled in from the same patch • Using the markovian properties, one can introduce local potentials that force neighborhood pixels to be filled in from the same patch • Through the addition of penalties in terms of discrepancies between the local labeling process

Defining the Problem • However the missing content is partially constructed and therefore data support is not available for areas with significant distance from the borders of the inpainted region? • Therefore we can relax the constraint and consider the notion of progressive stitching where in areas with existing content we use the distance between this content and the candidate seeds • While in areas with non-overlapping content, the use of already inpainted content is considered Distance from the borders of the inpainted region

Defining the Problem • Such modification introduces in the process the notion of time. • Its interpretation is simple; • First areas with strongly overlapping content and strong discontinuities (boundaries) will be inpainted • Then areas with strongly overlapping content and less important boundaries will be filled in • And last, completion will be performed through stitching on already stitched components using the same principles and moving towards from areas close to the existing content to the ones further away; • While in all stages of the reconstruction, pixels in the local neighborhood will be forced to be inpainted from the same patch

An Example… • If we have candidate seeds, a minimization method we can perform inpainting through the lowest potential of the cost function, that refers to discrete optimization;

On the selection of Seeds • Finding appropriate candidates for the completion is one of the two most critical components of our approach; • We are based on the principle that missing content is not necessarily in the vicinity of the destroyed region • We can consider the seed selection problem as a random variable; that consists of • The center of the seed (in the image) • Its from (mostly rectangular) as well as its dimensions and orientation • Such a problem consists of finding given an origin point in the impainted regions a number of candidate patches in the image domain that potentially have similar content with one that is missing and the one that is partially present

On the Selection of Candidate Seeds • Such seeds selection can be in a probabilistic formulation; • Given an origin point, create a number of perturbations [in terms of the position of the seed, its form and its orientation] according to some known distribution • Once new candidate seeds have been determined, evaluate the probability of containing information to complete the missing content; • Consider a number of the most probable seeds and repeat the process until a number of seeds have been selected of varying form, position and orientation with content potentially interesting to complete the missing one

Gaussian PDF estimation • Compute the present state of system using the observations from 1, to t: or determine the probability • If we assume that the density function is known at t-1: then one can consider the Bayes rule • That can be re-written using the Chapman-Kolmogorov equation • While one can claim a number of approaches to estimate this density (incremental EM – otherwise infinite memory is needed) a lack exists on efficient numerical techniques

Particle Filters • Particle filters, which are sequential Monte-Carlo techniques, estimate the Bayesian posterior probability density with a set of samples • Let us consider M random measures: • A particle filter is a non linear approximation of this density consists of : • With being weights that reflect the importance of the different samples • Once a set of samples has been drawn, can be computed out of the observation for each sample, and the estimation of the posteriori pdf can be sequentially updated.

And a Demonstration…

On the optimization of the Cost function • Suboptimal Methods • Simulated Anealing • Iterated Conditional Modes • Highest Confidence First • Semi-Optimal Methods • Graph-based optimization • Optimal methods • Meanfield anealing • Belief Propagation Networks

Graph Cuts • A graph consists of: • A set of nodes • A set of links/directed edges that connect these nodes • Two special terminal nodes often called source and sink that have a different nature than the rest of the nodes • Each graph-edge is assigned to a positive weight • The cost of an edge in one direction can differ from the one of the opposite direction • An edge is called a t-link if it connects a non-terminal node with a terminal and n-link if it connects two non- terminal nodes

The Min-Cut Problem • An cut refers to a partition of the graph nodes in two disjoint subsets and such that the source is in and the sink in • The Cost of a cut is the sum of cost weights of the “boundary edges” such that and • The MINIMUM cut problem is to the find the cut that has minimum cost among all possible cuts • One of the fundamental results in combinatorial optimization is that the MIN-CUT problem is equivalent with the MAX-FLOW problem.

The Max-Flow Problem • Consider to be a water source, and the graph a network of directed pipes with capacities equal to the edge weights • Maximum flow is the maximum amount of water that can be sent from the source to the sink • The theorem of Ford-Fulkerson states that the maximum flow from the source to the destination saturates a set of edges in the graph dividing the nodes into two disjoint sets corresponding to the minimum cut, • Two type of algorithms to solve the max-flow problem • Push-relabel techniques • Augmenting path methods

Input labeling f Red expansion move from f Expansion move algorithm • Find red expansion move that most decreases E • Move there, then find the best blue expansion move, etc • Done when no -expansion move decreases the energy, for any label  • Many nice theoretical properties

In Practice

The Case of Video & Structure • Video Can be considered as a three dimensional volume • The exact same concept can be used, where now during the construction of the graph in the temporal domain, object correspondences driven from the optical flow computation can be used to determine the costs of the cut in this direction

Discussion • Image, Texture , Video & Structure Completion was a formulated as a MAP problem • Image terms, as well as smoothness considerations were used to fill in the missing content • Random walks, particle filters and statistical hypotheses testing were used to determine candidate completion structures in the image • Combinatorial optimization and the alpha expansion algorithm was used to recover the optimal configuration in terms of labels • The method is rather general and can be used to deal with higher dimensions • The method is favorable compared to PDEs since no assumption on the image content is made.

Future Work • Replacing the Video Content through coupling with optical flow estimation (we are almost there) • Stereo reconstruction through similar fitting process…(start working on that) • Pre-reconstruction of a number of basic elements (3D patches) and the complete reconstruction of an unseen scene through le”going” these patches to the image content • Augmentation of archeological scenes to fill in the missing content from either different locations of the same monument, or components from different monuments • DO NOT FORGET TO GO TO THE NEXT SLIDE!!!

Advertisement Section The Handbook of Mathematical Models in Computer Vision Springer (2005), ISBN 0387263713, 596 pages • Editors • Nikos Paragios • Yunmei Chen • Olivier Faugeras http://cermics.enpc.fr/~paragios • Chapter 3: PDE-Based Image and Surface Inpainting, Bertalmio, Caselles, Haro, and Sapiro • Chapter 5: Graph Cuts in Vision and Graphics: Theories and Applications, Boykov and Veksler

Image, Texture, Video & Structure Completion