480 likes | 887 Views
Accurate Stereophotogrammetry. John Morris Electrical and Computer Engineering/ Computer Science, The University of Auckland. Iolanthe on the Hauraki Gulf. Key task – Correspondence Locate matching regions in both images. What is Stereo PhotogrammetryF?.
E N D
AccurateStereophotogrammetry John Morris Electrical and Computer Engineering/Computer Science,The University of Auckland Iolanthe on the Hauraki Gulf
Key task – Correspondence Locate matching regions in both images What is Stereo PhotogrammetryF? Pairs of images giving different views of the scene can be used to compute a depth (disparity) map
Vision Research tends to be rather visual ! Tendency to publish images `proving’ efficacy, efficiency, etc Computed: Census Ground Truth Computed: Pixel-to-Pixel Which is the better algorithm? Depth Maps
Motivation • Stereophotogrammetry started with a focus on accuracy • Used to produce accurate maps from aerial photography • Relied on • Large, expensive, mechanical ‘machines’ to align images and measure disparities • High resolution photographic film in precise cameras
Motivation • Large, expensive, mechanical ‘machines’ to align images and measure disparities • High resolution photographic film in precise cameras… Wild A10 Santoni Model III
Motivation then .. Along came digital cameras and computers • Low resolution ‘toy’ applications became the focus! • Web cameras • cheap and • stream low resolution images into a machine • Potential for • tracking objects • limited accuracy real-time environment mapping • All you need is • a piece of wood, • 2 webcams and • some of Cliff’s time to interface two cameras to a single PC
Stereophotogrammetry • Total cost • Webcams 2 x $100* • Wood $2 • Cliff’s time priceless** • Total $202 but … • What can you really do with such a system?(Except pass COMPSCI 773) ? In reality, • not much • Resolution and accuracy too low! • Lenses distort images also • Not muchstereophotogrammetry • * Choose some expensive ones! • ** Already done, incremental cost $0
Stereophotogrammetry But I’m a CS graduate • Software can do anything! • Correct for lens distortion • Interpolate • Sub-pixel accuracy but … • Accuracy is related to the quality of the input data! • Correction factors have limited accuracy • They’re derived from low accuracy images! • In reality, • There’s a limited amount you can do with poor input! ‘True’ signal enhancement usually relies on multiple samples of the same signal! In image processing, multiple samples from the same image lower resolution
Need for accuracy • Self-evident! • One example • Application: Collision avoidance (or navigating through any dynamic environment) • Critical measurement • Relative velocity • Obtained fromtwo scene measurements • z = a 10% • Then v Dz/Dt = (z(t2) – z(t1)) / (t2 – t1 ) • Error(v) Error(z(t1)) + Error(z(t2)) + Error(t1) + Error(t2) = 10% + 10% + (negligible, <0.1%) = 20% • Would you sit in an autonomous vehicle at 100km/h which measured its distance to other vehicles with this accuracy? 10% error in z? High? Check the stereo test images in the Middlebury database! Maximum disparities ~ 20 If dmeasured = 10, error is 10%
Photogrammetry Lab Canon Digital SLR + 50mm fixed focus lens Measured distortion ~ 1 pixel max in 3000 2000 pixel image (subject to confirmation!) • High resolution cameras • Stable platforms / precise alignment • Error reduction at source • Rectification of images • Introduced errors High quality, fixed focal length lens Precise alignment Precise, stable base Verging optics
Clearly depth resolution increases as the object gets closer to the camera • Distance, z = b f focal length p d pixel size disparity Points along these lineshave the same LR displacement (disparity) Stereo Camera Configuration • Standard Case – Two cameras with parallel optical axes • Rays are drawn through each pixel in the image • Ray intersections represent points imaged onto the centre of each pixel but • An object must fit into the Common Field of View
Depth Accuracy – Parallel Camera Axes • Given an object of an extent, a, there’s an optimum position for it! • Assuming baseline, b, can be varied • Common fallacy – just increase b to increase accuracy
Points along these lineshave the same LR displacement (disparity) Stereo Camera Configuration • This result is easily understood if you consider an object of extent, a • To be completely measured, it must lie in the Common Field of View • but • place it as close to the camera as you can so that you can obtain the best accuracy, say at D • Now increase b to increase the accuracy at D • But you must increase D so that the object stays within the CFoV! • Detailed analysis leads to the previous curve and an optimum value ofb a a D b
Stereophotogrammetry vs Collision Avoidance • This result is more relevant for stereo photogrammetry • You are trying to accurately determine the geometry of some object • It’s fragile, dangerous, …and you must use non-contact measurement • For collision avoidance, you are more concerned with measuring the closest approach of an object (ie any point on the object!) • you can increase the baseline so that the critical point stays within the CFoV Dcritical
Parallel Camera Axis Configuration • Accuracy depends on d - or the difference in image position in L and R imagesandin a digital system, on the number of pixels in d • Measurable regions also must lie in the CFoV • This configuration is rather wasteful • Observe how much of the image planes of the two cameras is wasted! Dcritical
Evolution • Human eyes ‘verge’ on an object to estimate its distance, ie the eyes fix on the object in the field of view Configuration commonly used in stereo systems Configuration discovered by evolution millions of years ago Note immediately that the CFoV is much larger!
Nothing is free! • Since the CFoV is much larger, more sensor pixels are being used and depth accuracy should increase but • Geometry is much more complicated! • Position on the image planes of a point at (x,z) in the scene: • Does the increased accuracy warrant the additional computational complexity? xL = f/p tan( arctan((b+2x)/2z) - f ) fvergence angle yL = f/p tan( arctan((b-2x)/2z) - f )
Depth Accuracy OK - better … but it’s not exactly spectacular! Is it worth the additional computational load?
A minor improvement? • What happened? • As the cameras turn in,Dmin gets smaller! • If Dmin is the critical distance,D < Dmin isn’t useful! This area isnow wasted!
Depth Accuracy - Verging axes, increased f Small vergence angle significantly better depth accuracy Note that at large f, the CFoV does not extend very far!
Increased focal length • Lenses with large f • Thinner • Fewer aberrations • Better images • Cheaper? • Alternatively, lower pixel resolution can be used to achieve better depth accuracy ...
Zero disparity matching • With verging axes,at the fixation point, scene points appear with zero disparity(in the same place on both L and R images) • If the fixation point is set at some sub-critical distance (eg an ‘early warning’ point), then matching algorithms can focus on a small range of disparities about 0 • With verging axes, both +ve and -ve disparities appear • Potential for fast, high performance matching focussing on this region
Non-parallel axis geometry • Points with the same disparity lie on circles now • For parallel axes, they lie on straight lines Locus for d = -1 Locus for d = 0 Locus for d = +1
Verging axis geometry • Points with the same disparity lie on Veith-Muller circles with the baseline as a chord
Zero disparity matching (ZDM) • Using a fixation point in some critical regionintroduces the possibility of faster matching • It can alleviate the statistical factor reducing matching quality • You search over a restricted disparity range • Several ‘pyramidal’ matching techniques have been proposed (and success claimed!) for conventional parallel geometries • These techniques could be adapted to ZDM • Care: • It has no effect on the other three factors!
Correspondence • OK .. now we have an optimum geometry .. • We just match up the images and • Sit back and enjoy the ride as our car weaves its way through the traffic! • Unfortunately, digital computers aren’t as good as human operators! eg the ones who produce maps from aerial photos!
Epipolar constraint Align images so that matches must appear in the same scan line in L & R images Key task – Correspondence Locate matching regions in both images Stereo Photogrammetry Pairs of images giving different views of the scene can be used to compute a depth (disparity) map
Sources of ‘noise’ in automated stereophotogrammetry • Signal noise • Electromagnetic interference (eg cross-talk) • Quantum behaviour of electronic devices (eg resistor shot-noise) • Quantization: digitization of real-valued signals • Geometric sources • Discrete pixel sensors with finite area • Occlusions • Perspective distortion • Electronic sources • Intensity sensitivity variations between cameras(eg different optical or electronic gain settings) • Different ‘dark noise’ levels • Optical sources • Non-uniform scattering (non-Lambertian sources) • Reflections and specular highlights • Angle dependent colour scattering (‘grating’ effects) • Lighting variation due to differing view angles Next stage • 3D streaming video with custom processor support
Discrete Pixels • CMOS image sensors • Usually matrix of sensors with coloured dye mask arranged in BGRG arrangement • Values for each colour at each pixel position derived by interpolation • We’ve already lost some accuracy in this process! • Cameras aim to produce pleasing pictures –the interpolation process is not visible • Some cameras provide ‘RAW’ output – more suitable for photogrammetry ?
Rectification • Given all these sources of noise, it’s important to eliminate as many as possible at source! Clearly, the smaller you can make the needed corrections, the better the input to the matching algorithms will be This is what your camera gives you Real lens distortion This is what it should look like in image plane coordinates Calculate fractions of neighbouring pixel intensities This is what you’d like to input to your stereo matching program
Discrete Pixels • Pixelization noise • Assume a uniform green object on a red background • Pixels in the ‘body’ of the object’s projection will be saturated green • Pixels in the edge will have some R:G ratio • Pixels in the same edge in the other image will generally have a different ratio • No possible match!(if you’re trying for a ‘perfect’ match)
Noise model • Each ‘correction’ introduces some additional uncertainty (or noise) • Matching algorithms should work in the context of a noise model • Most matching algorithms assume ‘ideal’ systems • ‘Ideal’ has many connotations here!! • Concurrent Stereo Matching • Work in progress (Liu, Gimel’farb, Delmas, Morris) • Initially accepts all possible matches • Given a model of the noise (including all sources) • Ask Jiang to talk about it!
Tsukuba Stereo Test Image • Real image – 384 240 • Hand generated disparity map • Very low resolution • Dmax = 14
CSM – Processing the Tsukuba Image Set Step 1 – Identify possible matches d = 5 d = 6 d = 8 d = 14 Step 2 – Form surfaces from local data & propagate back into scene
‘Competing’ techniques • Structure from motion • Motion is equivalent to baseline of stereo system • If accuracy of motion accuracy of baseline • Accuracy similar to parallel axis stereo • Generally relies on small movements to make matching problem tractable • Much smaller distance resolution
‘Competing’ techniques • Structured light • Requires two devices (camera and projector) of comparable resolution • Slower • Unique labeling of pixels requires O(log n) images • Projector is a ‘real’ optical device too (with a real lens) • Pattern edges are only sharp over a limited depth of field • Efficient pixel labeling over a small depth range only • Closing lens aperture to increase depth of field not an option • Structured light ideas combined with stereo cameras • Most effective combination?
‘Competing’ techniques • Laser Range Finder • Produces depths ‘directly’ from time of flight or phase difference measurements • Single device • High precision scanning optics required • Limits portability and robustness • Slow • One point at a time • Very high potential accuracy • Interferometer (l/n) accuracy possible • Time of flight systems limited by pulse length • High accuracy still possible! • Affected by reflectivity of targets • Sparse point clouds • Doesn’t need texture in the scene!
Future work • Real-time environment maps • Very large numbers of trivial computations! • High degree of parallelism (esp CSM algorithm)! • Ideal application for custom hardware • Limited accuracy system is feasible on 2005 FPGA hardware • Current work • Efficient parallel algorithms • Concurrent Stereo Matching (EMMCVPR, Florida, Sept 2005) • Custom hardware implementation • Goal: Depth maps at 30 fps video rates (3D movies!) • Efficient optical systems • Manufacturable • Robust • Next stage • 3D streaming video with custom processor support