1 / 60

Classification

Classification. ECE 847: Digital Image Processing. Stan Birchfield Clemson University. Acknowledgment. Many slides are courtesy of Frank Dellaert and Jim Rehg at Georgia Tech. from http://www-static.cc.gatech.edu/classes/AY2007/cs4495_fall/html/materials.html.

tangia
Download Presentation

Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification ECE 847:Digital Image Processing Stan Birchfield Clemson University

  2. Acknowledgment Many slides are courtesy of Frank Dellaert and Jim Rehg at Georgia Tech from http://www-static.cc.gatech.edu/classes/AY2007/cs4495_fall/html/materials.html

  3. Classification problems • Detection – Search set, find all instances of class • Recognition – Given instance, label its identity • Verification – Given instance and hypothesized identity, verify whether correct • Tracking – Like detection, but local search and fixed identity

  4. Classification issues • Feature extraction – needed for practical reasons; distinction is somewhat arbitrary: • Perfect feature extraction  classification is trivial • Perfect classifier  no need for feature extraction • occlusion (missing features) • mereology – study of part/whole relationshipsPOLOPONY, BEATS (not BE EATS) • segmentation – how can we classify before segmenting? how can we segment before classifying? • context • computational complexity: 20x20 binary input is 10120 patterns!

  5. Mereology example What does this say?

  6. Decision theory • Decision theory – goal is to make a decision (i.e., set a decision boundary) so as to minimize cost • Pattern classification is perhaps most important subfield of decision theory • Supervised learning: features, data sets, algorithm decision boundary

  7. Overfitting Could separate perfectly using nearest neighbors But poor generalization (overfitting) – will not work well on new data decision boundary Occam’s razor – The simplest explanation is the best(Philosophical principle based upon the orderliness of the creation)

  8. Bayes decision theory Problem: Given a feature x, determine the most likely class: w1 or w2 1 class-conditional pdfs 0 Easy to measure with enough examples

  9. Bayes’ rule likelihood (class-conditional pdf) prior evidence(normalization factor) posterior 1 0 1 0

  10. P(w1|x) P(w1|x)+P(w2|x)=1 ! What is this P(w1|x) ? • Probability of class 1 given data x 1.0 P(w2|x) ? 0.0 x Note: Area under each curve is not 1

  11. Bayes Classifier • Classifier: Select • Decision boundaries occur where 1.0 P(1|x) P(2|x) 0.0 selectw2 selectw1 selectw2

  12. Bayes Risk The total risk is the expected loss when using the classifier: where (We’re assuming loss is constant here) 1.0 P(1|x) P(2|x) 0.0 The shaded area is called the Bayes risk

  13. Discriminative vs. Generative Finding a decision boundary is not the same as modeling a conditional density. Note: Bug in Forsyth-Ponce book: P(1|x)+P(2|x) != 1

  14. Histograms • One way to compute class-conditional pdfs is to collect a bunch of examples and store a histogram • Then normalize

  15. Application: Skin Histograms • Skin has a very small range of (intensity independent) colours, and little texture • Compute colour measure, check if colour is in this range, check if there is little texture (median filter) • See this as a classifier - we can set up the tests by hand, or learn them. • get class conditional densities (histograms), priors from data (counting) • Classifier is

  16. Finding skin color 3D histogram in RGB space M. J. Jones and J. M. Rehg, Statistical Color Models with Application to Skin Detection, Int. J. of Computer Vision, 46(1):81-96, Jan 2002.

  17. Histogram skin non-skin

  18. Results Note: We have assumed that all pixels are independent!Context is ignored

  19. sensitivity= true positive rate = hit rate = recallTPR = TP / (TP+FN) false negative rate FNR = FN / (TP+FN) false positive rate = false alarm rate= falloutFPR = FP / (FP+TN) specificitySPC = TN / (FP+TN) Confusion matrix true positive = hit false positive = false alarm = false detection = Type I error false negative= miss = false dismissal = Type II error TPR + FNR = 1 FPR + SPC = 1

  20. Receiver operating characteristic (ROC) curve TPR equal error rate(EER) = 88% FPR confusion matrix for image classifier:

  21. Cross-validation

  22. Naïve Bayes • Quantize image patches, then compute a histogram of patch types within a face • But histograms suffer from the curse of dimensionality • Histogram in N dimensions is intractable with N>5 • To solve this, assume independence among the pixels • Features are the patch typesP(image|face) = P(label 1 at (x1,y1)|face)...P(label k at (xk,yk)|face)

  23. Histograms applied to faces and cars H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars". IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2000)

  24. Alternative: Kernel density estimation (Parzen windows) K/N is fraction of samples that fall into volume V

  25. Parzen windows • Non-parametric technique • Center kernel at each data point, sum results (and normalize) to get pdf

  26. Parzen windows

  27. Gaussian Parzen Windows

  28. Parzen Window Density Estimation

  29. Histograms non-parametric smoothing parameter = # of bins discard data afterwards discontinuous boundaries arbitrary d dimensions  Md bins (curse of dimensionality) Parzen windows non-parametric smoothing parameter = size of kernel need data always discontinuous (box) or continuous (Gaussian) boundaries data driven (box) or no boundaries (Gaussian) dimensionality not as much of a curse Comparison

  30. Another alternative: Locally Weighted Averaging (LWA) • Keep instance database • At each query point, form locally weighted average • Equivalent to Parzen windows • memory based, lazy learning, applicable to any kernel, can be slow f(i) = 1 for positive examples, 0 for negative examples

  31. LWA Classifier, Circular Kernel All Data Data, 2 classes LWA Posterior Kernel Weights

  32. K-Nearest Neighbors Classification = majority vote of K nearest neighbors

  33. We have seen very simple template matching (under filters) Some objects behave like quite simple templates Frontal faces Strategy: Find image windows Correct lighting Pass them to a statistical test (a classifier) that accepts faces and rejects non-faces Recognition by finding patterns

  34. Faces “look like” templates (at least when they’re frontal). General strategy: search image windows at a range of scales Correct for illumination Present corrected window to classifier Issues How corrected? What features? What classifier? Finding faces test image classifier training image feature extraction training database decision learner

  35. Face detection http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf

  36. Face recognition http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf

  37. Linear discriminant functions • g(x) = wTx+w0 • decision surface is hyperplane • w is perpendicular to hyperplane • neural network: combination of linear discriminant functions • sigmoid function is differentiable, enables backpropagation

  38. Neural networks for detecting faces Henry A. Rowley, Shumeet Baluja, and Takeo Kanade, Neural Network-Based Face Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 20, number 1, pages 23-38, January 1998.

  39. Neural networks for detecting faces positive training images: scaled, rotated, translated, and mirrored negative training images

  40. Neural networks for detecting faces

  41. Arbitration

  42. Bootstrapping • Hardest examples to classify are those near the decision boundary • These are also the most useful for training • Approach: Run detector, find examples of misclassification, feed back into training process

  43. Results

  44. Real-time face detection • Components • Cascade architecture • Box sum features (integral image) H1 H2 Non-face Hn Non-face Face Viola and Jones, CVPR 2001

  45. Haar-like features(Integral image makes computation fast)

  46. More features

  47. Example • Feature’s value is calculated as the difference between the sum of the pixels within white and black rectangle regions.

  48. Boosting

  49. Adaboost The more distinctive the feature, the larger the weight.

  50. Training images

More Related