500 likes | 644 Views
Evaluation Techniques in Computer Vision. EE4H, M.Sc 0407191 Computer Vision Dr. Mike Spann m.spann@bham.ac.uk http://www.eee.bham.ac.uk/spannm. Contents. Why evaluate? Images – synthetic/natural? Noise Example 1. Evaluation of thresholding/segmentation methods
E N D
Evaluation Techniques in Computer Vision EE4H, M.Sc 0407191 Computer Vision Dr. Mike Spann m.spann@bham.ac.uk http://www.eee.bham.ac.uk/spannm
Contents • Why evaluate? • Images – synthetic/natural? • Noise • Example 1. Evaluation of thresholding/segmentation methods • Example 2. Evaluation of optical flow methods
Why evaluate? • Computer vision algorithms are complex and difficult to analyse mathematically • Evaluation is usually through measurement of the algorithm’s performance on test images • Use of a range of images to establish performance envelope • Comparison with existing algorithms • Performance on degraded (noise-added) images (robustness) • Sensitivity to algorithm parameter settings
Test images • Real images • ‘Ground truth’ difficult to establish • Pseudo-real images • Could be synthetic objects moving against real background • Often a good compromise • Synthetic images • Noise and illumination variation over object surfaces hard to model realistically
Simple synthetic images • Simple ‘object-background’ synthetic images used to evaluate thresholding and segmentation algorithms • They obey a very simple image model (piecewise constant + Gaussian noise) • Unrealistic in practice – images are not like this!
Simple synthetic images Medium noise Zero noise Low noise
Pseudo-real images • More realistic object background images are better used to evaluate segmentation algorithms • Images of natural objects in natural illumination • Ground truth can be established using hand segmentation tools (such as built into many image processing packages)
Pseudo-real images Screws Keys Cars Washers
Simple synthetic edges • Again, piecewise constant + Gaussian noise image model • ‘Ideal’ step edge • Precise edge location but not achievable by finite aperture imaging systems
Simple synthetic edges Low noise Medium noise High noise
Pseudo-real edges • More realistic edge profiles can be created by smoothing an ideal step edge * = Step edge Gaussian filter
Pseudo-real movies • The ‘yosemite’ sequence is a computer generated movie of a rendering of a fly-through the Yosemite valley • Background clouds are real • Enables true flow (ground truth) to be determined • Used extensively in the evaluation of optical flow algorithms • yosemite.avi • yosemite_flow.avi
Noise • Often used to evaluate the ‘robustness’ of algorithms • Additive noise usual in optical images but multiplicative is more realistic in sonar/radar images • Noise level proportional to signal level • Usual noise model is independent random variables (usually Gaussian) • Correlated noise often more realistic
Noise • Standard noise model is zero-mean identical independently distributed (iid) Gaussian (normal) random variables • Characterised by variance • Probability distribution of rv’s
Noise • Noise level characterised by the signal-to-noise ratio • Usually expressed in dB’s • Defined as : • is the mean-square grey level defined (for a pixel image) as
Noise dB 30dB 0dB
Noise (mean-square error) • We can regard the mean-square error (difference) between 2 images as noise • Often used to evaluate image compression algorithms in comparing the original and decompressed images • Image differences can also be expressed as the peak-signal-to-noise-ratio (PSNR) in dB by taking the signal level as 255
Other types of noise • The other main category of (additive) noise is impulse (sometimes called ‘salt and pepper’) noise • Characterised by the impulse rate (spatial density of noise impulses) and mean square amplitude of impulse • Can normally be easily filtered out using median filters
Other types of noise Original Salt and pepper noise De-speckled
Other types of noise • There are many other types of noise which can be considered in algorithm evaluation • Essentially more sophisticated and realistic probability distributions of noise rv’s • For example a ‘generalised’ Gaussian model is often considered to model ‘heavy’ tailed distributions • However, in my humble opinion, a more realistic source of noise is the deviation away from the ‘ideal’ of the illumination variation across object surfaces
Evaluation of thresholding & segmentation methods • Segmentation and thresholding algorithms essentially group pixels into regions (or classes) • Simplest case is object/background • Simple evaluation metrics just quantify the number of miss-classified pixels • For basic images models such as constant greylevel in object/background regions plus iid Gaussian noise, the probability of error can be computed analytically
Evaluation of thresholding & segmentation methods • For a simple object/background image :
Evaluation of thresholding & segmentation methods • Miss-classification probability is a function of a threshold T • For a simple constant region greylevel model plus additive iid Gaussian noise we can easily derive an analytical expression for • Not very useful in practice as limited image model and we also require the ground truth • More useful just to simply measure the miss-classification error as a function of threshold
Evaluation of thresholding & segmentation methods • Usual to represent correct classification probabilities and false alarm probabilities jointly within a receiver operating curve (ROC) • For example, the ROC shows how these vary as a function of threshold for an object/background classification
Evaluation of thresholding & segmentation methods 1.0 T=0 Prob. of correct classification T=255 0.0 0.0 1.0 Prob. of false alarm
Evaluation of thresholding & segmentation methods • More useful methods of evaluation can be found by taking account of the application of the segmentation • Segmentation is rarely an end in itself but a component in an overall machine vision system • Also, the level of under- or over- segmentation of an algorithm needs to be determined
Evaluation of thresholding & segmentation methods Ground truth Under-segmentation Over-segmentation
Evaluation of thresholding & segmentation methods • Under-segmentation is bad as distinct regions are merged • Over-segmentation can be acceptable as sub-regions comprising a single ground truth region can be merged using ‘high’ level knowledge • Also, the level of over-segmentation can be controlled by parameter settings of the algorithm
Evaluation of thresholding & segmentation methods • A possible segmentation metric is to quantify correctly detected regions, over-segmentation and under-segmentation • Depends upon some threshold setting T • Region rather than pixel based • Used in Koester and Spann’s paper (IEEE Trans. PAMI, 2000)to evaluate range image segmentations
Evaluation of thresholding & segmentation methods • Correct detection • At least T % of the pixels in region k of the segmented image are marked as pixels in region j of the ground truth image • And vice versa Segmentation GT image
Evaluation of thresholding & segmentation methods • Over-segmentation • Region j in the ground truth image corresponds to regions k1, k2… km in the segmented image if : • At least T % of the pixels in region ki are marked as pixels of region j • At least T % of the pixels in region j are marked as pixels in the union of regions k1, k2… km
Evaluation of thresholding & segmentation methods GT image Segmentation
Evaluation of thresholding & segmentation methods • Under-segmentation • Regions j1, j2… jm in the ground truth image correspond to region k in the segmented image if : • At least T % of the pixels in region kare marked as pixels in the union of regions j1, j2… jm • At least T % of the pixels in region ji are marked as pixels in region k
Evaluation of thresholding & segmentation methods GT image Segmentation
Evaluation of thresholding & segmentation methods • The metric also allows us to quantify missed and noise regions • Missed regions – regions in the ground truth image not found in the segmented image • Noise regions – regions in the segmented image not found in the ground truth image • Overall, the average number of correct, over, under, missed and noise regions can be quantified over an image database and different algorithms compared
Evaluation of optical flow methods • Optical flow algorithms compute the 2D optical flow vector at each pixel using consecutive frames in a video sequence • Optical flow algorithms are notoriously un-robust • Crucial to evaluate the effectiveness of any method used (or any new method devised) • Usually ground truth difficult to come by
Evaluation of optical flow methods • This simple error measurement naturally amplifies errors when the flow vectors are large (for the same relative flow error) • Can normalize the error by the product of the magnitudes of the ground truth flow and flow estimate
Evaluation of optical flow methods • Often the ground truth is not available • A useful (but often crude) way of comparing the quality of two optical flow fields and is to compute the displaced frame difference (DFD) statistic • Uses the two consecutive frames of a sequence from which the flows were computed
Evaluation of optical flow methods • DFD is a crude estimate because it says nothing about the accuracy of the motion field directly – just the quality of the pixel mapping from one frame to the next • Plus it says nothing about the confidence attached to optical flow estimates • However, it is the basis of motion compensation algorithms for most of the current video compression standards (MPEG, H261 etc)
Evaluation of optical flow methods • In optical flow estimation, as in other types of estimation algorithms, we are often interested in the quality of the estimates • In classic estimation theory, we often compute confidence limits on estimates • We can say with a certain degree of confidence (say 90%) that the parameter lies within certain bounds • We usually assume that the quantities we are estimating follow some known probability distribution (for example chi-squared)
Evaluation of optical flow methods • In the case of optical flow vectors, confidence regions are ellipses in 2 dimensions • They essentially characterise the distribution of the estimation error • Assuming a normal distribution of the flow error, confidence ellipses can be drawn for any confidence limit • Orientation and shape of ellipses determined by the covariance matrix defining the normal distribution • The eigenvalues of the covariance matrix define a particular confidence limit
Evaluation of optical flow methods 99% 90% 70% Confidence ellipses of
Evaluation of optical flow methods Yosemite true flow Yosemite Yosemite flow (L&K) Yosemite flow (L&K) confidence thresholded
Conclusions • Evaluation in computer vision is a difficult and often controversial topic • I would suggest 3 rules of thumb to consider when evaluating your work for the purposes of assignments • Consider carefully your test data. Make it as realistic as possible • Make your evaluations as much as possible ‘application driven’ • Make your algorithms ‘self evaluating’ if possible through the use of confidence statistics