Scene Completion Using Millions of Photographs

Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007

Outline • Introduction • Overview • Semantic Scene Matching • Local Context Matching • Results and Comparison • Conclusion

Introduction • Image completion(inpainting, hole-filling) • Filling in or replacing an image region with new image data such that the modification can not be detected

Introduction • The data could have been there • The data should have been there

Introduction • The existing methods operate by extending adjacent textures and contours into the unknown region • Filling in the unknown region with content from the known parts of the input image

Introduction • The assumption is that all the necessary image data to fill in an unknown region is located somewhere else in the same image • This assumption is flawed

Overview • We perform image completion by leveraging a massive database of images • Two compelling reasons • A region will be impossible to fill plausibly using only image data from the source image • Reusing that content would often leave obvious duplications

Overview • There are several challenges with drawing content from other images • Computational • Semantically invalid • Seamlessly

Overview • Alleviate computational and semantic • Find images depicting semantically similar scenes • Use only the best matching scenes to find patches which match the content surrounding the missing region • Seamlessly combine image regions • Graph cut segmentation • Poisson blending

Semantic Scene Matching • Our image database • Download images in thirty Flickr.com groups • Download images based on keyword searches • Discarded duplicate images and images that are too small • Distributed among a cluster of 15 machines • Acquir about 2.3 million unique images

Semantic Scene Matching • Look for scenes which are most likely to be semantically equivalent to the image requiring completion • GIST descriptor • Augment the scene descriptor with color information of the query image down-sampled to the spatial resolution of the gist

Semantic Scene Matching • Given an input image to be hole-filled, we first compute its gist descriptor with the missing regions excluded • We calculate the SSD between the the gist of the query image and every gist in the database • The color difference is computed in the lab color space

Local Context Matching • Having constrained our search to semantically similar scenes we can use Template matching to more precisely align

Local Context Matching • Pixel-wise alignment score • We define the local context to be all pixels within an 80 pixel radius of the hole’s boundary • This context is compared against the 200 best matching scenes • Using SSD error in lab color space

Local Context Matching • Texture similarity score • Measure coarse compatibility of the proposed fill-in region to the source image within the local context • Computed as a 5x5 median filter of image gradient magnitude at each pixel • The descriptors of the two images are compared via SSD

Local Context Matching • Composite each matching scene into the incomplete image at its best placement using a form of graph cut seam finding and standard poisson blending

Local Context Matching • Past image completion algorithms • The remaining valid pixels in an image can not changed • Our completion algorithms • Allow to remove valid pixels from the query image • But discourage the cutting of too many pixels

Local Context Matching • Past seam-finding • Minimum intensity difference between two images • Cause the seam to pass through many high frequency edges • Our seam-finding • Minimum the gradient of the image difference along the seam

Local Context Matching • We find the seam by minimizing the following cost function • : unary costs of assigning any pixel p, to a specific label L(p) • L(p) : patch or exist

Local Context Matching • For missing regions of the existing image • is a very large number • For regions of the image not covered by the scene match • is a very large number • For all other pixels • is pixel’s distance from the hole • k = 0.02

Local Context Matching • is non-zero only for immediately adjacent, 4-way connected pixels • L(p) = L(q), the cost is zero • L(p) L(q), • is the magnitude of the gradient of the SSD between the existing image and the scene match at pixels p and q

Local Context Matching • Finally we assign each composite a score • The scene matching distance • The local context matching distance • The local texture similarity distance • The cost of the graph cut • We present the user with the 20 composites with the lowest scores

Local Context Matching

Results and Comparison

Results and Comparison • Lucky • Find another image from the same physical location • It is not our goal to complete scenes and objects with their true selves in the database

Results and Comparison

Results and Comparison • Failure cases : artifact

Results and Comparison • Failure cases : semantic violations

Results and Comparison • Failure cases : no object recognition

Results and Comparison • Failure cases : past methods perform well • For uniformly textured backgrounds • Our method is unlikely to find the exact same texture in another photograph

Conclusion • This paper • Present a new image completion algorithm powered by a huge database. • Unlike past methods that reuse visual data within the source image. • Further work • Two million images are still a tiny fraction of the high quality photograph available. • Our approach would be an attractive web-base application.

Thank you!!!

Scene Completion Using Millions of Photographs

Scene Completion Using Millions of Photographs

Presentation Transcript

Image Completion using Global Optimization

Using Photographs to Enhance Videos of a Static Scene

Photographs of China

Millions and Millions of Organic Compounds

Model Completion using ARP/wARP

MILLIONS

Photographs

Millions of Possibilities

Photographs courtesy of:

Millions

Photographs

millions

(Millions $)

Famous Photographs of

Millions of Pounds

Photographs

PHOTOGRAPHS

Millions of Minutes

Exercises Using Bubble Chamber Photographs

Image Completion using Global Optimization

Famous Photographs of

(millions)