570 likes | 759 Views
Accuracy in Real-Time Depth Maps. John MORRIS Centre for Image Technology and Robotics (CITR) Computer Science/Electrical Engineering University of Auckland, New Zealand and 전자전기공학부 , 중앙대학교 , 서울. Iolanthe II drifting off Waiheke Is. Outline. Background Motivation
E N D
Accuracy in Real-Time Depth Maps John MORRIS Centre for Image Technology and Robotics (CITR) Computer Science/Electrical EngineeringUniversity of Auckland, New Zealand and 전자전기공학부, 중앙대학교,서울 Iolanthe II drifting off Waiheke Is
Outline • Background • Motivation • Problem • Collision Avoidance • Accuracy • Parallel Axes case • Verging Axes • Optimizing • Active Illumination • Algorithm Performance • Stereo Algorithms • Which one is best?
Motivation • Stereo Vision has many applications • Aerial Mapping • Forensics • Crime Scenes • Traffic Accidents • Mining • Mine face measurement • Civil Engineering • Structure monitoring • General Photogrammetry • Non contact measurement • Most of these are not time critical …
Motivation • Time-critical Applications • Most existing applications are not time critical • Several would benefit from real-time feedback as data was collected • Traffic accident scene assessment • There’s pressure to clear the scene and let traffic continue • Investigators have to rely on experience while taking images • Mining • Real-time feedback could direct machinery to follow a pre-determined plan • … • and then there’s … • Collision avoidance • Without real-time performance, it’s useless!
Motivation • Collision avoidance • Why stereo? • RADAR keeps airplanes from colliding • SONAR • Keeps soccer-playing robots from fouling each other • Guides your automatic vacuum cleaner • Active methods are fine for `sparse’ environments • Airplane density isn’t too large • Only 5 robots / team • Only one vacuum cleaner
Motivation • Collision avoidance • What about Seoul (Bangkok, London, New York, …) traffic? • How many vehicles can rely upon active methods? • Reflected pulse is many dB below probe pulse! • What fraction of other vehicles can use the same active method before even the most sophisticated detectors get confused?(and car insurance becomes unaffordable ) • Sonar, in particular, is subject to considerable environmental noise also • Passive methods (sensor only) are the only ‘safe’ solution • In fact, with stereo, one technique for resolving problems may be assisted by environmental noise!
Epipolar constraint Align images so that matches must appear in the same scan line in L & R images Key task – Correspondence Locate matching regions in both images Stereo Photogrammetry Pairs of images giving different views of the scene can be used to compute a depth (disparity) map
Vision Research tends to be rather visual ! Tendency to publish images `proving’ efficacy, efficiency, etc Computed: Census Computed: Pixel-to-Pixel Depth Maps Ground Truth Which is the better algorithm?
Performance and Accuracy • I will use • Performance to describe the quality of matching • For how many points was the distance computed correctly? • Metrics • % of good matches, • Standard deviation of matching error distribution • Function of the matching algorithm, image quality, etc • Accuracy for precision of depth measurements • Assuming a pixel is matched correctly, how accurate is the computed depth? • or • What is the resolution of depth measurements? • Metric • Error in depth - absolute or relative (% of measured depth) • Function of stereo configuration and sensor resolution (pixel number and size)
Accuracy • Traditional (film-based) stereophotogrammetry limited by film grain size • Small enough so that mechanical accuracy of the measuring equipment became the limiting factor • and • Accuracy was determined by your $ budget • More $s -> higher resolution equipment • Mapping • Digital cameras • discrete (large but shrinking!) pixels • significant accuracy considerations
Stereo Geometry • How accurate are these depth maps? • In collision avoidance, we need to know the current distance to an object and be able to derive our relative velocity • Example: • An object’s image ‘has a disparity of 20 pixels’ • Its image in the R image is displaced by 20 pixels relative to the L image • Accuracy of its position? • First approximation ~ 5% ( 1 / 20 ) • How do we obtain better accuracy?
Stereo Camera Configuration • Standard CaseTwo cameras with parallel optical axes b baseline (camera separation) q camera angular FoV Dsens sensor width n number of pixels p pixel width f focal length a object extent D distance to object
Clearly depth resolution increases as the object gets closer to the camera • Distance, z = b f focal length p d pixel size disparity Points along these lineshave the same LR displacement (disparity) Stereo Camera Configuration • Standard Case – Two cameras with parallel optical axes • Rays are drawn through each pixel in the image • Ray intersections represent points imaged onto the centre of each pixel • but • an object must fit into the Common Field of View
Depth Accuracy – Canonical Configuration • Given an object of an extent, a, there’s an optimum position for it! • Assuming baseline, b, can be varied • Common fallacy – just increase b to increase accuracy
a Points along these lineshave the same LR displacement (disparity) Stereo Camera Configuration • This result is easily understood if you consider an object of extent, a • To be completely measured, it must lie in the Common Field of View • but • place it as close to the camera as you can so that you can obtain the best accuracy, say at D • Now increase b to increase the accuracy at D • But you must increase D so that the object stays within the CFoV! • Detailed analysis leads to the previous curve and an optimum value of b a D b
Stereophotogrammetry vs Collision Avoidance • This result is more relevant for stereo photogrammetry • You are trying to accurately determine the geometry of some object • It’s fragile, dangerous, …and you must use non-contact measurement • For collision avoidance, you are more concerned with measuring the closest approach of an object (ie any point on the object!) • you can increase the baseline so that the critical point stays within the CFoV Dcritical
Collision Avoidance • For collision avoidance, you are more concerned with measuring the closest approach of an object (ie any point on the object!) • you can increase the baseline so that the critical point stays within the CFoV Dcritical
Increasing the baseline Increasing the baseline decreases performance!! % good matches Images: ‘corridor’ set (ray-traced) Matching algorithms: P2P, SAD Baseline, b
Increasing the baseline Examine the distribution of errors Increasing the baseline decreases performance!! Standard Deviation Images: ‘corridor’ set (ray-traced) Matching algorithms: P2P, SAD Baseline, b
Increased Baseline Decreased Performance • Reasons • Statistical • Higher disparity range • increased probability of matching incorrectly - you’ve simply got more choices! • Perspective • Scene objects are not fronto-planar • Angled to camera axes • subtend different numbers of pixels in L and R images • Scattering • Perfect scattering (Lambertian) surface assumption • OK at small angular differences • increasing failure at higher angles • Occlusions • Number of hidden regions increases as angular difference increases • increasing number of ‘monocular’ points for which there is no 3D information!
Accuracy in Collision Avoidance • Accuracy is important! • Your ability to calculate an optimum avoidance strategy depends on an accurate measure of the collision velocity • Luckily, accuracy does increase as an object approaches the critical region, but we’d still like to measure the collision velocity accurately at as large a distance as possible! • For parallel camera axes, D = f b / d • where d = xL - xR = n p Nice, simple (if reciprocal) relationship! D distance f focal length b baseline d measured disparity xL|R position in L|R image n number of pixels p pixel size
Parallel Camera Axis Configuration • Accuracy depends on d - or the difference in image position in L and R imagesandin a digital system, on the number of pixels in d • Measurable regions also must lie in the CFoV • This configuration is rather wasteful • Observe how much of the image planes of the two cameras is wasted! Dcritical
Evolution • Human eyes ‘verge’ on an object to estimate its distance, ie the eyes fix on the object in the field of view Configuration commonly used in stereo systems Configuration discovered by evolution millions of years ago Note immediately that the CFoV is much larger!
Nothing is free! • Since the CFoV is much larger, more sensor pixels are being used and depth accuracy should increasebut • Geometry is much more complicated! • Position on the image planes of a point at (x,z) in the scene: • Does the increased accuracy warrant the additional computational complexity? xL = f/p tan( arctan((b+2x)/2z) - f ) fvergence angle yL = f/p tan( arctan((b-2x)/2z) - f ) Note: In real fixed systems, Computational complexity can be reduced, see the notes on real-time stereo!
Depth Accuracy OK - better … but it’s not exactly spectacular! Is it worth the additional computational load?
A minor improvement? • What happened? • As the cameras turn in,Dmin gets smaller! • If Dmin is the critical distance,D < Dmin isn’t useful! This area isnow wasted!
Look at the optical configuration! • If we increase f, then Dmin returns to the critical value! Original f Increase f
Depth Accuracy - Verging axes, increased f Now the depth accuracy has increased dramatically! Note that at large f, the CFoV does not extend very far!
Increased focal length • Lenses with large f • Thinner • Fewer aberrations • Better images • Cheaper? • Alternatively, lower pixel resolution can be used to achieve better depth accuracy ...
Zero disparity matching • With verging axes,at the fixation point, scene points appear with zero disparity(in the same place on both L and R images) • If the fixation point is set at some sub-critical distance (eg an ‘early warning’ point), then matching algorithms can focus on a small range of disparities about 0 • With verging axes, both +ve and -ve disparities appear • Potential for fast, high performance matching focussing on this region • Possible research project! • This is similar to the way our vision system works:we focus on the area around the fixation point andhave a higher density of rods and cones in the centre of our retina
Non-parallel axis geometry • Points with the same disparity lie on circles now • For parallel axes, they lie on straight lines Locus for d = -1 Locus for d = 0 Locus for d = +1
Verging axis geometry • Points with the same disparity lie on Veith-Muller circles with the baseline as a chord
Zero disparity matching (ZDM) • Using a fixation point in some critical regionintroduces the possibility of faster matching • It can alleviate the statistical factor reducing matching quality • You search over a restricted disparity range • Several ‘pyramidal’ matching techniques have been proposed (and success claimed!) for conventional parallel geometries • These techniques could be adapted to ZDM • Care: • It has no effect on the other three factors!
Why is stereo such a good candidate for dense collision avoidance applications? • One serious drawback • It doesn’t work with textureless or featureless regions • There’s nothing for the matching algorithm to match! • Active illumination • Impressing a textured pattern (basically any one will do!) on the scene • Several groups (including ours!) have demonstrated that this is effective - increasing matching performance significantly • Real benefit • Environmental ‘noise’ (ambient light patterns) do not interfere!! • In fact, they may provide the texture needed to assist matching Thus multiple vehicles impressing ‘eye-safe’ (near IR) patterns onto the environment should only help each other
Metrics ( Introduce some science! ) • Compute the distribution of differences between depth maps derived for an algorithm and the ground truth From this distribution, we can derive several measures: % of good matches (error ≤ 0.5) Histogram mean (bias) Histogram Std Dev (spread) Generally we have used the “% of good matches” metric Mean and standard deviation are used as auxiliary metricsRunning time was also measured
Ray-Traced Images … • SNR = SNR = +36dB SNR = 0 dB • The ‘Corridor’ set are synthetic (perfect) images • Generated by ray-tracing software • Possible to corrupt them with various levels of noise to test robustness
Algorithms Taxonomy • Area-based • Match regions in both images, eg 99 windows surrounding a pixel • Dense depth maps • A depth assigned to every pixel • Tend to have dataflow computation styles • Most suitable for hardware implementation • Feature-based • Look for features first, then attempt matches • eg edge-detect, then match edges • Sparse depth maps • Less suitable for hardware implementation • More branches in the logic We concentrated on area-based algorithms Our original goal was hardware (FPGA) implementation Some trials on simple feature-based matching showed no improvement over area-based algorithms
Algorithms • Area-based • Correlation • A window is moved along a scanline in the R image until the best match with a similar-sized window in the L image is found • ‘Best match’ defined by various cost functions • Multiplicative correlation • Normalized squared differences • … (many other variations!) • Sum of absolute differences (SAD) • Ignore occlusions (pixels visible in one image only) • Dynamic • Attempt to find the best matching path through a region defined by corresponding pixel in the R image – maximum disparity, • Can recognize occlusions • … and many more (graph cut, pyramidal, optical flow, … )
Algorithms Evaluated • Area-based • Correlation • 3 different cost functions • Multiplicative correlation • Normalized squared differences • Sum of absolute differences (SAD) • Census • Reduces pixel intensity differences to a single bit • Counts bit differences • Claimed suitable for hardware implementation • Dynamic • Birchfield and Tomasi’s Pixel-to-Pixel chosen because it takes occlusions into account • Most others are too computationally expensive for real-time implementation • Even taking potential parallelism in hardware into account! • eg graph-cut (best results, but slow – >100s per image!)
S(IL(x,y)-IR(x,y-d))2 SIL(x,y)IR(x,y-d) C(x,y,d) = C(x,y,d) = SIL(x,y)2SIR(x,y-d)2 SIL(x,y)2 SIR(x,y-d)2 C(x,y,d) = S|IL(x,y)-IR(x,y-d)| Algorithm Details • Correlation Algorithm Cost Functions • Corr1 – Normalized intensity difference • Corr2 – Normalized multiplicative correlation • SAD • Census • Rank ordering of pixel intensities over an inner window forms a ‘census vector’ (one bit / pixel in window) • Cost function is Hamming distance of these vectors • Summed over outer window
Typical set of experiments • Census algorithm • Two operational parameters • a – length of the census vector, ie size of the window over which a rank (ordering) transform is performed • b – size of correlation window • Trials were run for all reasonable combinations of the two parameters on all 6 test images • + one additional aerial photograph pair from IGN, Paris • These trials locate optimal values of the algorithm parameters • w, window ‘radius’ for simple correlation algorithms • (a,b) for Census • (kmatch reward, kocclusion) for Pixel-to-pixel
Census Good Matches – Corridor a = 4 β = 3 Good match % β a
Census Good Matches – All images a = 4 β = 3 is close to best for for all images Good match % β a
Census – Corridor - Metrics Approaches 0 as expected for larger windows • Becomes smaller for larger windows • Narrower error peakcentred on zero • Matching really isimproving!
Pixel-to-Pixel • Birchfield and Tomasi • ‘Dynamic’ algorithm • Attempts to find the best matching ‘path’ • Cost function g(M) = Nocckocc – Nmkr + Sdissimilarity • Variable parameters • kocc – Occlusion penalty • kr – Matching reward • Dissimilarity • Usually | IL – IR | • Other variations possible • Sub-pixel matching, etc Number of matches Number of Occlusions
Pixel-to-Pixel Results occ = 30 r = 8 produces good results for all images Good match % r occ
Correlation Results Good Match % Optimum, r ~ 4 (99 window) window radius (2r+1)*(2r+1) window
Compare algorithms SAD performs as well as the others over a range of images! • Measure the performance!