3D Vision

3D Vision CMPSCI 591A/691A CMPSCI 570/670 Topic 9 Stereo Vision (I)

2D to 3D Inference • Observations • Objects are mostly 3D • Images are 2D arrays of intensity, color values, etc. • 3D depth information is not explicitly encoded in images (it is explicitly recorded in range images) • However, 2D analysis implicitly uses 3D info • 3D structures are generally not random • coherency in motion • Man-made objects are of regular shapes and boundaries • straight lines and smooth curves in images • Explicit 3D information can be recovered using 2D shape cues • disparities in stereo • shading change due to orientation • texture gradient due to view point change etc.

View Images as “windows” into the 3D world

Shape Inference Techniques

Stereo • The Stereo Problem • Reconstruct scene geometry from two or more calibrated images • Basic Principle: Triangulation • Gives reconstruction as intersection of two rays • Requires point correspondence • This is the hard part

Stereo Vision • Problem • Infer 3D structure of a scene from two or more images taken from different viewpoints • Two primary Sub-problems • Correspondence problem (stereo match) -> disparity map • Similarity instead of identity • Occlusion problem: some parts of the scene are visible in one eye only • Reconstruction problem -> 3D • What we need to know about the cameras’ parameters • Often a stereo calibration problem • Lectures on Stereo Vision • Stereo Geometry – Epipolar Geometry (*) • Correspondence Problem (*) – Two classes of approaches • 3D Reconstruction Problems – Three approaches

A Stereo Pair • Problems • Correspondence problem (stereo match) -> disparity map • Reconstruction problem -> 3D 3D? ? CMU CIL Stereo Dataset : Castle sequence http://www-2.cs.cmu.edu/afs/cs/project/cil/ftp/html/cil-ster.html

More Images…

Stereo View Right View Left View Disparity

Stereo Disparity • The separation between two matching objects is called the stereo disparity. • Disparity is measured in pixels and can be positive or negative (conventions differ). It will vary across the image.

f b Z = d Disparity and Depth • If the cameras are pointing in the same direction, the geometry is simple. • b is the baseline of the camera system, • Z is the depth of the object, • d is the disparity (left x minus right x) and • f is the focal length of the cameras. • Then the unknown depth is given by b

Basic Stereo Configuration

Parallel Cameras • Disparity is inversely proportional to depth—so stereo is most accurate for close objects. • Once we have found depth, the other coordinates in 3-D follow easily — e.g. taking either one of the images: • where x is the image coordinate, and likewise for Y.

Converging Cameras Object at depth Z • This is the more realistic case. • The depth at which the cameras converge, Z0, is the depth at which objects have zero disparity. • Finding Z0 is part of stereo calibration. • Closer objects have convergent disparity (numerically positive) and further objects have divergent disparity (numerically negative).

The Correspondence Problem • To measure disparity, we first have to find corresponding points in the two images. • This turns out not to be easy. • Our own visual systems can match at a low level, as shown by random-dot stereograms, in which the individual images have no structure above pixel scale, but which when fused show a clear 3-D shape.

Stereo Matching • Stereo matchers need to start from some assumptions. • Corresponding image regions are similar. • A point in one image may only match a single point in the other image. • If two matched features are close together in the images, then in most casestheir disparities will be similar, because the environment is made of continuous surfaces separated by boundaries. • Many matching methods exist. The basic distinction is between • feature-based methods which start from image structure extracted by preprocessing; and • correlation-based methods which start from individual grey-levels.

Feature Based Methods • Extract feature descriptions: regions, lines, …. • Pick a feature in the left image. • Take each feature in the right image in turn (or just those close to the epipolar line), and measure how different it is from the original feature: Size, aspect ratio, average grey level etc.

Feature-Based Methods • S is a measure of similarity • w0 etc. are weights, and • the other symbols are different measures of the feature in the right and left images, such as length, orientation, average grey level and so on. • Choose the right-image feature with the largest value of S as the best match for the original left-image feature. • Repeat starting from the matched feature in the right image, to see if we achieve consistency.

Feature Based Methods • It is possible to use very simple features (just points, in effect) if the constraint that the disparity should vary smoothly is taken into account. • Feature-based methods give a sparse set of disparities — disparities are only found at feature positions.

Correlation Based Methods • Take a small patch of the left image as a mask and convolve it with the part of the right image close to the epipolar line. • Over an area • The peak of the convolution output gives the position of the matching area of the right image, and hence the disparity of the best match.

Correlation-Based Methods • Convolution-based methods can give a dense set of disparities — disparities are found for every pixel. • These methods can be very computationally intensive, but can be done efficiently on parallel hardware. Where to search? section of left image convolve Convolution peak at position of corresponding patch in right image

Stereo Correspondence • Epipolar Constraint • Reduces correspondence problem to 1D search along conjugate epipolar lines • Stereo rectification: make epipolar lines horizontal

Correspondence Difficulties • Why is the correspondence problem difficult? • Some points in each image will have no corresponding points in the other image. (1) the cameras might have different fields of view. (2) due to occlusion. • A stereo system must be able to determine the image parts that should not be matched.

Variations • Scale-space. Methods that exploit scale-space can be very powerful. • Smooth the image using Gaussian kernels • Match each feature with the nearest one in the other image — location will be approximate because of blurring. • Use the disparities found to guide the search for matches in a less smoothed image. • Relaxation. This is important in the context of neural modelling. • Set up possible feature matches in a network. • Construct an energy function that captures the constraints of the problem. • Make incremental changes to the matches so that the energy is steadily reduced.

Other Aspects of Stereo • Very precise stereo systems can be made to estimate disparity at sub-pixel accuracies. This is important for industrial inspection and mapping from aerial and satellite images. • In controlled environments, structured light (e.g. stripes of light) can be used to provide easy matches for a stereo system.

Structured Light • Structured lighting • Feature-based methods are not applicable when the objects have smooth surfaces (i.e., sparse disparity maps make surface reconstruction difficult). • Patterns of light are projected onto the surface of objects, creating interesting points even in regions which would be otherwise smooth. • Finding and matching such points is simplified by knowing the geometry of the projected patterns.

Stanford’s Digital Michaelangelo Project • http://graphics.stanford.edu/projects/mich/ maximum height of gantry: 7.5 meters weight including subbase: 800 kilograms

Stanford’s Digital Michaelangelo Project 480 individually aimed scans 2 billion polygons 7,000 color images 32 gigabytes 30 nights of scanning 1,080 man-hours 22 people

Correspondence Problems • Which point matches with which point?

Difficulties • Multiple matches are always likely • Simple features (e.g., black dots) • large number of potential matches • precise disparity • Complex features (e.g., polygons) • small number of potential matches • less precise disparity

Trinocular Stereo From H. G. Wells, War of the Worlds

Three Camera Stereo • A powerful way of eliminate spurious matches • Hypothesize matches between A & B • Matches between A & C on green epipolar line • Matches between B & C on red epipolar line • There better be something at the intersection

3D Vision

3D Vision

Presentation Transcript

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision