280 likes | 612 Views
Face Detection and Neural Networks. Todd Wittman Math 8600: Image Analysis Prof. Jackie Shen December 2001. Face Detection. Problem: Given a color image, determine if the image contains a human face. That is, can you tell our governor from a toaster?. vs.
E N D
Face DetectionandNeural Networks Todd Wittman Math 8600: Image Analysis Prof. Jackie Shen December 2001
Face Detection Problem: Given a color image, determine if the image contains a human face. That is, can you tell our governor from a toaster? vs. Answer: The picture on the right contains a human face. I think. Applications: AI, tracking, automated security, video retrieval
Overview of Face Detection Methods • Edge detection to recognize features and spatial relationships (Marsicoi, ‘97). • HSV-space segmentation and vector angular-based distance measure (Andoutsos, ‘99). • Chroma chart to detect skin tones and edge detection to identify eyes and mouth (Cai, ‘99). • Unsupervised Adaptive Skin Color Model, also called clustering (Bergasa, ‘99).
Neural Network Goal: Given a set of inputs X and desired outputs T, determine the weights s.t. X generates T. Idea: Similar inputs will give similar outputs. X T Hidden Layer Training: Set weights to minimize . Levenberg-Marquad Algorithm (multi-dim steepest descent). Training is very expensive computationally. If there are x input nodes, t output nodes, and p hidden nodes, then # weights = (x+t)p.
Face Detection NN Input: Color image. Output: P(w|x) = probability that image contains a face. (Only 1 output node.) Set 1 for face, 0 for no face. P=0 P=1 • 3 Possible Outputs • P > 0.5 FACE • P < 0.5 NOT FACE • P = 0.5 DON’T KNOW
1st Attempt: Interpolated Image Input X: The pixel values of the image at N selected grid points. Original Interpolated Output P=1 Since each pixel has three values (RGB), our input vector X will have length 3N. I tried a small case: N=25. The network took over an hour to train for the training set on the next slide.
Results P values for 20 images in training set. P=0.5 for all training images. The interpolated images can’t be interpreted.
2nd Attempt: RGB Histograms Input X: The 3 histograms of the RGB values, appended as 1 vector. Each histogram has N=20 bins. So size of input vector is 3N=60. Idea: Neural network will pick out the frequency of flesh tones.
Results After 100 iterations (1 hour, 1241 weights), the Levenberg-Marquad algorithm was able to correctly classify all 20 training images. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 But on a test set of 13 images, got 7 correct (53.8%).
3rd Attempt: YES Histograms RGB histograms were too similar. Y = 0.253R + 0.684G + 0.063B E = 0.5R - 0.5G S = 0.25R + 0.25G - 0.5B RGB YES Input X: 3 YES histograms appended as one vector.
Results After training for 100 iterations, 3 images in training set were mis-classified. 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 - 0.1 But on test set, correctly identified 13 out of 13 images (100%).
Th-th-that’s All, Folks! You can try my Matlab code: www.math.umn.edu/~wittman/faces/main.html
Input Layer Output Layer Hidden Layer T X