1 / 25

Categorization by Learning and Combing Object Parts

Categorization by Learning and Combing Object Parts. B. Heisele, T. Serre, M. Pontil, T. Vetter, T. Poggio. Presented by Manish Jethwa. Overview. Learn discriminatory components of objects with Support Vector Machine (SVM) classifiers. Background. Global Approach

blair-cain
Download Presentation

Categorization by Learning and Combing Object Parts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Categorization by Learning and Combing Object Parts B. Heisele, T. Serre, M. Pontil, T. Vetter, T. Poggio. Presented by Manish Jethwa

  2. Overview • Learn discriminatory components of objects with Support Vector Machine (SVM) classifiers.

  3. Background • Global Approach • Attempt to classify the entire object • Successful when applied to problems in which the object pose is fixed. • Component-based techniques • Individual components vary less when object pose changes than whole object • Useable even when some of the components are occluded.

  4. Support Vectors Linear Support Vector Machines • Linear SVMs are used to discriminate between two classes by determining the separating hyperplane.

  5. l f (x)=∑αi yi < xi . x> + b i=1 Number of training data points Training data points Adjustable coefficients -solution of quadratic programming problem -positive weights for Support Vectors -zero for all other data points Class label {-1,1} Bias New data point Decision function • The decision function of the SVM has the form: f (x) defines a hyperplane dividing The data. The sign of f(x) indicates the class of x.

  6. Significance of αi • Correspond to the weights of the support vectors. • Learned from training data set. • Used to compute the margin M of the support vectors to the hyperplane. Margin M = (√∑ili )-1

  7. Diameter of sphere containing all training data Non-separable Data • The notion of a margin extends to non-separable data also. • Misclassified points result in errors. • The hyperplane is now defined by maximizing the margin while minimizing the summed error. • The expected error probability of the SVM satisfies the following bound: EPerr ≤l -1E[D2/M2]

  8. D1 M1 D2 M2 = D2 D1 ρ1= ρ2 M2 M1 Measuring Error • Error is dependant on the following ratio: ρ= D2/M2 • Renders ρ invariant to scale

  9. ρ Expansion left Learning Components

  10. Learning Facial Components • Extracting face components is time consuming • Requires manually extracting each component from all training images. • Use textured head models instead • Automatically produce a large number of faces under differing illumination and poses • Seven textured head models used to generate 2,457 face images of size 58x58

  11. Negative Training Set • Use extract 58x58 patches from 502 non-face images to give 10,209 negative training points. • Train SVM classifier on this data, then add false positives to the negative training set. • Increases negative training set with those images which look most like faces.

  12. The eyes (17x17 pixels) The nose (15x20 pixels) The mouth (31x15 pixels) The cheeks (21x20 pixels) The lip (13x16 pixels) The nostrils (22x12 pixels) The corners of the mouth (15x20 pixels) The eyebrows (15x20 pixels) The bridge of the nose (15x20 pixels) Learned Components • Start with fourteen manually selected 5x5 seed regions.

  13. Left Eye Expert Linear SVM Nose Expert Linear SVM Mouth Expert Linear SVM Shift component Experts over 58x58 window Combining Components Combining Classifier Linear SVM Determine maximum output and its location Shift 58x58 window over input image Final decision face / background

  14. Experiments • Training data for 3 classifiers: • 2,457 faces, 13,654 non-face grey images • Test data • 1,834 faces, 24,464 non-face grey images • Components vs. Whole face • Components method performs better than benchmark: whole face detector, with 2nd degree polynomial SVM.

  15. Results Faces detected by the component-based classifier

  16. Computer Examples

  17. The Data • Used images from MIT CBCL faces dataset: • Image are 19x19 pixels • Greyscale • Histogram equalized • Full training set contains • 2,429 faces, 4,548 non-faces • Only used 100 face examples, 300 non-face • Test set contains • 472 faces, 23,573 non-faces • All 24,045 images used

  18. Learning Components Left eye component: • Extract three negative examples for every positive one • This provides a bias to correctly classify non-face examples. Minimize false positives. • Training set contains 400 examples: • 100 left eye examples • 300 non-left eye examples

  19. Results • Left Eye learned using 400 examples: • Training set classified 97% correctly • Test set classified 96.2% correctly (of 24,045) • Right Eye learned using 400 examples: • Training set classified 100% correctly • Test set classified 96.5% correctly (of 24,045) • Left Eye learned using 400 examples: • Training set classified 95% correctly • Test set classified 95.7% correctly (of 24,045)

  20. Locating Learned Components Left eye is reasonably well localized

  21. Locating Learned Components Histograms are fairly flat compared to face images

  22. Face detector • Learned components • Left eye 8x6 pixels • Right eye 8x6 pixels • Mouth 11x 4 pixels • Each learned from 400 examples • 100 positive • 300 negative • Trained SVM with fixed location for components • No shifting

  23. Face Detector Training • The output Oi from each component i (distance for the hyperplane) is used together with the center ( Xi , Yi ) location of corresponding component. • All components are combined into one input feature vector: X = (Oleft, Xleft ,Yleft , Oright , Xright ,Yright , Omouth , Xmouth ,Ymouth )

  24. Results • The resulting SVM correct classified all 400 training examples • For non-face examples: • Placing component centers in random locations resulted in 100% correct classification. • Placing component centers in identical positions to face examples resulted in 98.4% accuracy.

  25. Is this the best face detector ever? • Performed well on the given dataset. • Low resolution But… • Dataset contains centered faces • Component positions were given • Did not shift component window and look for maximum • Did not have the opportunity to test against other algorithm So… We will never know.

More Related