Evaluation Techniques in Computer Vision

Evaluation Techniques in Computer Vision EE4H, M.Sc 0407191 Computer Vision Dr. Mike Spann m.spann@bham.ac.uk http://www.eee.bham.ac.uk/spannm

Contents • Why evaluate? • Images – synthetic/natural? • Noise • Example 1. Evaluation of thresholding/segmentation methods • Example 2. Evaluation of optical flow methods

Why evaluate? • Computer vision algorithms are complex and difficult to analyse mathematically • Evaluation is usually through measurement of the algorithm’s performance on test images • Use of a range of images to establish performance envelope • Comparison with existing algorithms • Performance on degraded (noise-added) images (robustness) • Sensitivity to algorithm parameter settings

Test images • Real images • ‘Ground truth’ difficult to establish • Pseudo-real images • Could be synthetic objects moving against real background • Often a good compromise • Synthetic images • Noise and illumination variation over object surfaces hard to model realistically

Simple synthetic images • Simple ‘object-background’ synthetic images used to evaluate thresholding and segmentation algorithms • They obey a very simple image model (piecewise constant + Gaussian noise) • Unrealistic in practice – images are not like this!

Simple synthetic images Medium noise Zero noise Low noise

Pseudo-real images • More realistic object background images are better used to evaluate segmentation algorithms • Images of natural objects in natural illumination • Ground truth can be established using hand segmentation tools (such as built into many image processing packages)

Pseudo-real images Screws Keys Cars Washers

Simple synthetic edges • Again, piecewise constant + Gaussian noise image model • ‘Ideal’ step edge • Precise edge location but not achievable by finite aperture imaging systems

Simple synthetic edges Low noise Medium noise High noise

Pseudo-real edges • More realistic edge profiles can be created by smoothing an ideal step edge * = Step edge Gaussian filter

Pseudo-real movies • The ‘yosemite’ sequence is a computer generated movie of a rendering of a fly-through the Yosemite valley • Background clouds are real • Enables true flow (ground truth) to be determined • Used extensively in the evaluation of optical flow algorithms • yosemite.avi • yosemite_flow.avi

Noise • Often used to evaluate the ‘robustness’ of algorithms • Additive noise usual in optical images but multiplicative is more realistic in sonar/radar images • Noise level proportional to signal level • Usual noise model is independent random variables (usually Gaussian) • Correlated noise often more realistic

Noise • Standard noise model is zero-mean identical independently distributed (iid) Gaussian (normal) random variables • Characterised by variance • Probability distribution of rv’s

Noise • Noise level characterised by the signal-to-noise ratio • Usually expressed in dB’s • Defined as : • is the mean-square grey level defined (for a pixel image) as

Noise dB 30dB 0dB

Noise (mean-square error) • We can regard the mean-square error (difference) between 2 images as noise • Often used to evaluate image compression algorithms in comparing the original and decompressed images • Image differences can also be expressed as the peak-signal-to-noise-ratio (PSNR) in dB by taking the signal level as 255

Noise (mean-square error)

Other types of noise • The other main category of (additive) noise is impulse (sometimes called ‘salt and pepper’) noise • Characterised by the impulse rate (spatial density of noise impulses) and mean square amplitude of impulse • Can normally be easily filtered out using median filters

Other types of noise Original Salt and pepper noise De-speckled

Other types of noise • There are many other types of noise which can be considered in algorithm evaluation • Essentially more sophisticated and realistic probability distributions of noise rv’s • For example a ‘generalised’ Gaussian model is often considered to model ‘heavy’ tailed distributions • However, in my humble opinion, a more realistic source of noise is the deviation away from the ‘ideal’ of the illumination variation across object surfaces

Other types of noise

Evaluation of thresholding & segmentation methods • Segmentation and thresholding algorithms essentially group pixels into regions (or classes) • Simplest case is object/background • Simple evaluation metrics just quantify the number of miss-classified pixels • For basic images models such as constant greylevel in object/background regions plus iid Gaussian noise, the probability of error can be computed analytically

Evaluation of thresholding & segmentation methods • For a simple object/background image :

Evaluation of thresholding & segmentation methods • Miss-classification probability is a function of a threshold T • For a simple constant region greylevel model plus additive iid Gaussian noise we can easily derive an analytical expression for • Not very useful in practice as limited image model and we also require the ground truth • More useful just to simply measure the miss-classification error as a function of threshold

Evaluation of thresholding & segmentation methods • Usual to represent correct classification probabilities and false alarm probabilities jointly within a receiver operating curve (ROC) • For example, the ROC shows how these vary as a function of threshold for an object/background classification

Evaluation of thresholding & segmentation methods 1.0 T=0 Prob. of correct classification T=255 0.0 0.0 1.0 Prob. of false alarm

Evaluation of thresholding & segmentation methods • More useful methods of evaluation can be found by taking account of the application of the segmentation • Segmentation is rarely an end in itself but a component in an overall machine vision system • Also, the level of under- or over- segmentation of an algorithm needs to be determined

Evaluation of thresholding & segmentation methods Ground truth Under-segmentation Over-segmentation

Evaluation of thresholding & segmentation methods • Under-segmentation is bad as distinct regions are merged • Over-segmentation can be acceptable as sub-regions comprising a single ground truth region can be merged using ‘high’ level knowledge • Also, the level of over-segmentation can be controlled by parameter settings of the algorithm

Evaluation of thresholding & segmentation methods • A possible segmentation metric is to quantify correctly detected regions, over-segmentation and under-segmentation • Depends upon some threshold setting T • Region rather than pixel based • Used in Koester and Spann’s paper (IEEE Trans. PAMI, 2000)to evaluate range image segmentations

Evaluation of thresholding & segmentation methods • Correct detection • At least T % of the pixels in region k of the segmented image are marked as pixels in region j of the ground truth image • And vice versa Segmentation GT image

Evaluation of thresholding & segmentation methods • Over-segmentation • Region j in the ground truth image corresponds to regions k1, k2… km in the segmented image if : • At least T % of the pixels in region ki are marked as pixels of region j • At least T % of the pixels in region j are marked as pixels in the union of regions k1, k2… km

Evaluation of thresholding & segmentation methods GT image Segmentation

Evaluation of thresholding & segmentation methods • Under-segmentation • Regions j1, j2… jm in the ground truth image correspond to region k in the segmented image if : • At least T % of the pixels in region kare marked as pixels in the union of regions j1, j2… jm • At least T % of the pixels in region ji are marked as pixels in region k

Evaluation of thresholding & segmentation methods GT image Segmentation

Evaluation of thresholding & segmentation methods • The metric also allows us to quantify missed and noise regions • Missed regions – regions in the ground truth image not found in the segmented image • Noise regions – regions in the segmented image not found in the ground truth image • Overall, the average number of correct, over, under, missed and noise regions can be quantified over an image database and different algorithms compared

Evaluation of optical flow methods • Optical flow algorithms compute the 2D optical flow vector at each pixel using consecutive frames in a video sequence • Optical flow algorithms are notoriously un-robust • Crucial to evaluate the effectiveness of any method used (or any new method devised) • Usually ground truth difficult to come by

Evaluation of optical flow methods

Evaluation of optical flow methods • This simple error measurement naturally amplifies errors when the flow vectors are large (for the same relative flow error) • Can normalize the error by the product of the magnitudes of the ground truth flow and flow estimate

Evaluation of optical flow methods • Often the ground truth is not available • A useful (but often crude) way of comparing the quality of two optical flow fields and is to compute the displaced frame difference (DFD) statistic • Uses the two consecutive frames of a sequence from which the flows were computed

Evaluation of optical flow methods

Evaluation of optical flow methods • DFD is a crude estimate because it says nothing about the accuracy of the motion field directly – just the quality of the pixel mapping from one frame to the next • Plus it says nothing about the confidence attached to optical flow estimates • However, it is the basis of motion compensation algorithms for most of the current video compression standards (MPEG, H261 etc)

Evaluation of optical flow methods • In optical flow estimation, as in other types of estimation algorithms, we are often interested in the quality of the estimates • In classic estimation theory, we often compute confidence limits on estimates • We can say with a certain degree of confidence (say 90%) that the parameter lies within certain bounds • We usually assume that the quantities we are estimating follow some known probability distribution (for example chi-squared)

Evaluation of optical flow methods • In the case of optical flow vectors, confidence regions are ellipses in 2 dimensions • They essentially characterise the distribution of the estimation error • Assuming a normal distribution of the flow error, confidence ellipses can be drawn for any confidence limit • Orientation and shape of ellipses determined by the covariance matrix defining the normal distribution • The eigenvalues of the covariance matrix define a particular confidence limit

Evaluation of optical flow methods 99% 90% 70% Confidence ellipses of

Evaluation of optical flow methods Yosemite true flow Yosemite Yosemite flow (L&K) Yosemite flow (L&K) confidence thresholded

Conclusions • Evaluation in computer vision is a difficult and often controversial topic • I would suggest 3 rules of thumb to consider when evaluating your work for the purposes of assignments • Consider carefully your test data. Make it as realistic as possible • Make your evaluations as much as possible ‘application driven’ • Make your algorithms ‘self evaluating’ if possible through the use of confidence statistics

Evaluation Techniques in Computer Vision

Evaluation Techniques in Computer Vision

Presentation Transcript

Evaluation Techniques

Computer Vision

Evaluation Techniques

Evaluation Techniques

Computer Vision

Motion in Computer Vision

Computer Vision

Computer Vision

Cyclist Helmet Recognition Using Computer Vision Techniques

Challenges in Computer Vision

Application in Computer Vision

Attention in Computer Vision

Computer Vision Techniques for Underwater Navigation

Vision In Enterprise Evaluation

Computer Vision

Computer Vision

Evaluation techniques

Evaluation Techniques

Application in Computer Vision