Understanding Eigenface Generation for Face Recognition

Eigenface Dr. Hiren D. Joshi Dept. of Computer Science Rollwala Computer Center Gujarat University

Eigenface • Eigenfaces are a set of eigenvectors used in the computer vision problem of human face recognition. • The approach of using eigenfaces for recognition was developed by Sirovich and Kirby (1987) and used by Matthew Turk and Alex Pentland in face classification. • It is considered the first successful example of facial recognition technology. • These eigenvectors are derived from the covariance matrix of the probability distribution of the high-dimensional vector space of possible faces of human beings.

Eigenface generation • To generate a set of eigenfaces, a large set of digitized images of human faces, taken under the same lighting conditions, are normalized to line up the eyes and mouths. • They are then all resampled at the same pixel resolution. Eigenfaces can be extracted out of the image data by means of a mathematical tool called principal component analysis (PCA).

Eigenface generation • The eigenfaces that are created will appear as light and dark areas that are arranged in a specific pattern. • This pattern is how different features of a face are singled out to be evaluated and scored. • There will be a pattern to evaluate symmetry, if there is any style of facial hair, where the hairline is, or evaluate the size of the nose or mouth. • Other eigenfaces have patterns that are less simple to identify, and the image of the eigenface may look very little like a face.

Eigenface generation • The technique used in creating eigenfaces and using them for recognition is also used outside of facial recognition. • This technique is also used for handwriting analysis, lip reading, voice recognition, sign language/hand gestures and medical imaging. • Therefore, some do not use the term eigenface, but prefer to use 'eigenimage'. • Research that applies similar eigen techniques to sign language images has also been made.

Eigenface generation • Informally, eigenfaces are a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. • Any human face can be considered to be a combination of these standard faces. • For example, your face might be composed of the average face plus 10% from eigenface 1, 55% from eigenface 2, and even -3% from eigenface 3. • Remarkably, it does not take many eigenfaces summed together to give a fair likeness of most faces. • Also, because a person's face is no longer recorded by a digital photograph, but instead as just a list of values (one value for each eigenface in the database used), much less space is taken for each person's face.

Practical implementation • Prepare a training set. • The faces constituting the training set should be of prepared for processing, in the sense that they should all have the same resolution and that the faces should be roughly aligned. • Each image is seen as one vector, simply by concatenating the rows of pixels in the original image. • A greyscale image with r rows and c columns is therefore represented as a vector with r x c elements. • In the following discussion we assume all images of the training set are stored in a single matrix T, where each row of the matrix is an image

Practical implementation • Subtract the mean. • The average image a has to be calculated and subtracted from each original image in T.

Practical implementation • Calculate the eigenvectors and eigenvalues of the covariance matrix S. • Each eigenvector has the same dimensionality as the original images and can itself be seen as an image. • The eigenvectors of this covariance matrix are therefore called eigenfaces. • They are the directions in which the images in the training set differ from the mean image.

Practical implementation • Choose the principal components. • The D x D covariance matrix will result in D eigenvectors, each representing a direction in the image space. • Keep the eigenvectors with largest associated eigenvalue.

Practical implementation • The eigenfaces can now be used to represent new faces: we can project a new (mean-subtracted) image on the eigenfaces and thereby record how that new face differs from the mean face. • The eigenvalues associated with each eigenface represent how much the images in the training set vary from the mean image in that direction. • We lose information by projecting the image on a subset of the eigenvectors, but we minimise this loss by keeping those eigenfaces with the largest eigenvalues.

Practical implementation • For instance, if we are working with a 100 x 100 image, then we will obtain 10,000 eigenvectors. • In practical applications, most faces can typically be identified using a projection on between 100 and 150 eigenfaces, so that most of the 10,000 eigenvectors can be discarded.

Computing the eigenvectors • Performing PCA directly on the covariance matrix of the images is often computationally infeasible. • If small, say 100 x 100, greyscale images are used, each image is a point in a 10,000-dimensional space and the covariance matrix S is a matrix of 10,000 x 10,000 = 108 elements. • However the rank of the covariance matrix is limited by the number of training examples: if there are N training examples, there will be at most N-1 eigenvectors with non-zero eigenvalues. • If the number of training examples is smaller than the dimensionality of the images, the principal components can be computed more easily as follows.

Computing the eigenvectors • Let T be the matrix of preprocessed training examples, where each row contains one mean-subtracted image. • The covariance matrix can then be computed as S = TTT and the eigenvector decomposition of S is given by

Computing the eigenvectors • However TTT is a large matrix, and if instead we take the eigenvalue decomposition of then we notice that by pre-multiplying both sides of the equation with TT, we obtain

Computing the eigenvectors • Meaning that, if ui is an eigenvector of TTT, then vi=TTui is an eigenvector of S. • If we have a training set of 300 images of 100 x 100 pixels, the matrix TTT is a 300 x 300 matrix, which is much more manageable than the 10000 x 10000 covariance matrix. • Notice however that the resulting vectors vi are not normalised; if normalisation is required it should be applied as an extra step.

Use in facial recognition • Facial recognition was the source of motivation behind the creation of eigenfaces. • For this use, eigenfaces have advantages over other techniques available, such as the system's speed and efficiency. • Using eigenfaces is very fast, and able to functionally operate on lots of faces in very little time. • Unfortunately, this type of facial recognition does have a drawback to consider: trouble recognizing faces when they are viewed with different levels of light or angles.

Use in facial recognition • For the system to work well, the faces need to be seen from a frontal view under similar lighting. • Face recognition using eigenfaces has been shown to be quite accurate. • By experimenting with the system to test it under variations of certain conditions, the following correct recognitions were found: an average of 96% with light variation, 85% with orientation variation, and 64% with size variation. (Turk & Pentland 1991, p. 590)

Use in facial recognition • To complement eigenfaces, another approach has been developed called eigenfeatures. • This combines facial metrics (measuring distance between facial features) with the eigenface approach. • Another method, which is competing with the eigenface technique uses 'fisherfaces'. • This method for facial recognition is less sensitive to variation in lighting and pose of the face than the method using eigenfaces.

Use in facial recognition • A more modern alternative to eigenfaces and fisherfaces is the active appearance model, which decouples the face's shape from its texture: it does an eigenface decomposition of the face after warping it to mean shape. • This allows it to perform better on different projections of the face, and when the face is tilted.

PCA

Standard Deviation • To understand standard deviation, we need a data set. • Statisticians are usually concerned with taking a sample of a population. • To use election polls as an example, the population is all the people in the country, whereas a sample is a subset of the population that the statisticians measure.

The great thing about statistics is that by only measuring (in this case by doing a phone survey or similar) a sample of the population, you can work out what is most likely to be the measurement if you used the entire population.

Example • X = [1 2 4 6 12 15 25 45 68 67 65 98] • If I want to refer to an individual number in this data set, I will use subscripts on the symbol X to indicate a specific number. • Eg. X3 refers to the 3rd number in X namely the number 4. • Note that X1 is the first number in the sequence.

the symbol n will be used to refer to the number of elements in the set X. • There are a number of things that we can calculate about a data set. • For example,we can calculate the mean of the sample. • I assume that the reader understands what the • mean of a sample is, and will only give the formula:

(X bar) to indicate the mean of the set X Notice the symbol All this formula says is “Add up all the numbers and then divide by how many there are”.

Unfortunately, the mean doesn’t tell us a lot about the data except for a sort of middle point. • For example, these two data sets have exactly the same mean (10), but are obviously quite different: [0 8 12 20] and [8 9 11 12] • So what is different about these two sets? It is the spread of the data that is different. • The Standard Deviation (SD) of a data set is a measure of how spread out the data is.

Standard Deviation • “The average distance from the mean of the data set to a point”. • The way to calculate it is to compute the squares of the distance from each data point to the mean of the set, add them all up, divide by n-1, and take the positive square root.

SD • Where s is the usual symbol for standard deviation of a sample.

the first set has a much larger standard deviation due to the fact that the data is much more spread out from the mean.

Variance • Variance is another measure of the spread of data in a data set. • In fact it is almost identical to the standard deviation. The formula is this: s2 is the usual symbol for variance of a sample. Both these measurements are measures of the spread of the data. Standard deviation is the most common measure, but variance is also used.

Covariance • The last two measures we have looked at are purely 1-dimensional. Data sets like this could be: • heights of all the people in the room, • marks for the last COMP101 exam etc. • However many data sets have more than one dimension, and the aim of the statistical analysis of these data sets is usually to see if there is any relationship between the dimensions. • For example, we might have as our data set both the height of all the students in a class, and the mark they received for that paper. • We could then perform statistical analysis to see if the height of a student has any effect on their mark.

Covariance • Standard deviation and variance only operate on 1 dimension, so that you could only calculate the standard deviation for each dimension of the data set independently of the other dimensions. • However, it is useful to have a similar measure to find out how much the dimensions vary from the mean with respect to each other.

Covariance • Covariance is such a measure. • Covariance is always measured between 2 dimensions. • If you calculate the covariance between one dimension and itself, you get the variance.

Covariance • if you had a 3-dimensional data set (x,y,z ) then you could measure the covariance between the x and y dimensions, the x and z dimensions, and the y and z dimensions. • Measuring the covariance between x and x, or y and y, or z and z would give you the variance of the x ,y and z dimensions respectively.

Covariance • The formula for variance could also be written like this: The formula for covariance:

Covariance • “For each data item, multiply the difference between the x value and the mean of x, by the difference between the y value and the mean of y. Add all these up, and divide by (n -1) ”. • Imagine we have gone into the world and collected some 2-dimensional data, say, we have asked a bunch of students how many hours in total that they spent studying COSC241, and the mark that they received.

Covariance • So we have two dimensions, the first is the H dimension, the hours studied, and the second is the M dimension, the mark received. • The exact value is not as important as it’s sign (ie. Positive or negative). If the value is positive, as it is here, then that indicates that both dimensions increase together, meaning that, in general, as the number of hours of study increased, so did the final mark.

Covariance • If the value is negative, then as one dimension increases, the other decreases. If we had ended up with a negative covariance here, then that would have said the opposite that as the number of hours of study increased the final mark decreased. • In the last case, if the covariance is zero, it indicates that the two dimensions are independent of each other.

Covariance

Since the covariance value can be calculated between any 2 dimensions in a data set, this technique is often used to find relationships between dimensions in high-dimensional data sets where visualization is difficult. • Cov(x,y) = Cov(y,x)

The covariance Matrix • Covariance is always measured between 2 dimensions. • If we have a data set with more than 2 dimensions, there is more than one covariance measurement that can be calculated. • For example, from a 3 dimensional data set (dimensions x,y,z) you could calculate cov(x,y), (cov(x,z) and cov(y,z)).

The covariance Matrix • In fact, for an n dimensional data set, you can calculate N!/((N-2)!*2) different covariance values. • A useful way to get all the possible covariance values between all the different dimensions is to calculate them all and put them in a matrix.

The covariance Matrix • The definition for the covariance matrix for a set of data with n dimensions is: where Cn x n is a matrix with n rows and n columns, and Dimx is the xth dimension. If you have an n dimensional data set, then the matrix has n rows and columns (so is square) and each entry in the matrix is the result of calculating the covariance between two separate dimensions.

The covariance Matrix • Eg. The entry on row 2, column 3, is the covariance value calculated between the 2nd dimension and the 3rd dimension. • We’ll make up the covariance matrix for an imaginary 3 dimensional data set, using the usual dimensions x, y and z. • the covariance matrix has 3 rows and 3 columns, and the values are this:

The covariance Matrix

The covariance Matrix • Down the main diagonal, you see that the covariance value is between one of the dimensions and itself. These are the variances for that dimension. • The other point is that since cov(a,b) = cov(b,a), the matrix is symmetrical about the main diagonal.

Eigenvectors • We can multiply two matrices together, provided they are compatible sizes. • Eigenvectors are a special case of this. • Consider the two multiplications between a matrix and a vector • In the first example, the resulting vector is not an integer multiple of the original vector, whereas in the second example, the example is exactly 4 times the vector we began with.

Understanding Eigenface Generation for Face Recognition

Understanding Eigenface Generation for Face Recognition

Presentation Transcript

Eigenface and Fisher Face

Evaluation of Image Pre-processing Techniques for Eigenface Based Face Recognition

Making faces with ID: an eigenface based construction system

Evaluation of Image Pre-processing Techniques for Eigenface Based Face Recognition