1 / 116

Principal Component Analysis

Principal Component Analysis. CSE 4310 – Computer Vision Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington. Images as High-Dimensional Vectors. Consider these five images. Each of them is a 100x100 grayscale image.

Download Presentation

Principal Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principal Component Analysis CSE 4310 – Computer Vision Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

  2. Images as High-Dimensional Vectors • Consider these five images. • Each of them is a 100x100 grayscale image. • What is the dimensionality of each image?

  3. Do We Need That Many Dimensions? • Consider these five images. • Each of them is a 100x100 grayscale image. • What is the dimensionality of each image? 10,000. • However, each image is generated by: • Picking an original image (like the image on the left). • Translating (moving) the image up or down by a certain amount . • Translating the image left or right by a certain amount . • Rotating the image by a certain degree . • If we know the original image, to reconstruct any other image we just need three numbers: .

  4. Dimensionality Reduction • The goal of dimensionality reduction methods is to build models that allow representing high-dimensional vectors using a smaller number of dimensions. • Hopefully, a much smaller number of dimensions. • The model is built using training data. • In this example, the model consists of: • A projection function . Given an input image , outputs the corresponding translation and rotation parameters . • A backprojection function . Given translation parameters , and rotation parameter , outputs the corresponding image.

  5. Dimensionality Reduction • The goal of dimensionality reduction methods is to build models that allow representing high-dimensional vectors using a smaller number of dimensions. • Hopefully, a much smaller number of dimensions. • The model is built using training data. • In this example, the model consists of: • A projection function . Given an input image , outputs the corresponding translation and rotation parameters . • A backprojection function . Given translation parameters , and rotation parameter , outputs the corresponding image. • If we have a lossless projection function, then • Typically, projection functions are lossy. • We try to find and so that tends to be close to .

  6. Linear Dimensionality Reduction • Linear dimensionality reduction methods use linear functions for projection and backprojection. • Projection: , where: • is a column vector of dimensions. • is an matrix, where hopefully ( is much smaller than ). • This way, projects -dimensional vectors to -dimensional vectors. • Advantage of linear methods: there are well known methods for finding a good . We will study some of those methods. • Disadvantage: they cannot capture non-linear transformations, such as the image translations and rotations of our example.

  7. Intrinsic Dimensionality • Sometimes, high dimensional data is generated using some process that uses only a few parameters. • The translated and rotated images of the digit 3 are such an example. • In that case, the number of those few parameters is called the intrinsic dimensionality of the data. • It is desirable (but oftentimes hard) to discover the intrinsic dimensionality of the data.

  8. Lossy Dimensionality Reduction • Suppose we want to project all points to a single line. • This will be lossy. • What would be the best line?

  9. Lossy Dimensionality Reduction • Suppose we want to project all points to a single line. • This will be lossy. • What would be the best line? • Optimization problem. • The number of choices is infinite. • We must define an optimization criterion.

  10. Optimization Criterion • Consider a pair of 2-dimensional points: . • Let map each 2D point to a point on a line. • So, • Define. • Squared distance from to . • Define . • Define error function . • Will ever be negative?

  11. Optimization Criterion • Consider a pair of 2-dimensional points: . • Let map each 2D point to a point on a line. • So, • Define . • Squared distance from to . • Define . • Define error function . • Will ever be negative? • NO: always. Projecting to fewer dimensions can only shrink distances.

  12. Optimization Criterion • Now, consider all points: • . • Define error function as: • Interpretation: Error function measures how well projection preserves distances.

  13. Optimization Criterion • Now, consider all points: • . • Define error function as: • Suppose that perfectly preserves distances. • Then, : • In that case, ???

  14. Optimization Criterion • Now, consider all points: • . • Define error function as: • Suppose that perfectly preserves distances. • Then, : • In that case, . • In the example shown on the figure, obviously .

  15. Optimization Criterion:Preserving Distances • We have defined an error function that tells us howgood a linear projection is. • Therefore, the best line projection is the one that minimizes.

  16. Optimization Criterion:Preserving Distances • We have defined an optimization criterion, that measures how wella projection preserves the pairwise distances of the original data. • Another criterion we could use:Minimizing the sum of backprojection errors: • We will not prove it here, but the criterion of preserving distances is mathematically equivalent to minimizing backprojection errors.

  17. Finding the Best Projection: PCA • First step: center the data. points centered_points

  18. Finding the Best Projection: PCA First step: center the data.% Each column of points is a vector. % Each vector in our dataset is a column in points. number = size(points, 2); % note that we are transposing twice average = [mean(points')]'; centered_points = zeros(size(points)); for index = 1:number centered_points(:, index) = points(:, index) - average; end plot_points(centered_points, 2);

  19. Finding the Best Projection: PCA • Second step: compute the covariance matrix. % Each column of centered_points is a vectorfrom our % centered dataset. covariance_matrix = centered_points * centered_points';

  20. Finding the Best Projection: PCA • Second step: compute the covariance matrix. % Each column of centered_points is a vectorfrom our % centered dataset. covariance_matrix = centered_points * centered_points'; • Third step: compute the eigenvectors and eigenvalues of the covariance matrix. [eigenvectors eigenvalues] = eig(covariance_matrix);

  21. Eigenvectors and Eigenvalues eigenvectors = 0.4837 -0.8753 -0.8753 -0.4837 eigenvalues = 2.0217 0 0 77.2183 • Each eigenvector v is a column, that specifies a line going through the origin. • The importance of the i-th eigenvector is reflected by the i-th eigenvalue. • second eigenvalue = 77, first eigenvalue = 2, => second eigenvector is far more important.

  22. Eigenvectors and Eigenvalues eigenvectors = 0.4837 -0.8753 -0.8753 -0.4837 eigenvalues = 2.0217 0 0 77.2183 • Suppose we want to find the optimal one-dimensional projection (“optimal” according to the criteria we defined earlier). • The eigenvector with the highest eigenvalue is the best line for projecting our data.

  23. Eigenvectors and Eigenvalues eigenvectors = 0.4837 -0.8753 -0.8753 -0.4837 eigenvalues = 2.0217 0 0 77.2183 • In higher dimensions: • Suppose that each vector is D-dimensional. • Suppose that we want to project our vectors to a d-dimensional space (where d < D). • Then, the optimal subspace to project to is defined by the d eigenvectors with the highest eigenvalues.

  24. Visualizing the Eigenvectors black: v1 (eigenvalue = 2.02) red: v2 (eigenvalue = 77.2)Which of these two lines is betterto project our points to? plot_points(points, 1); p1 = eigenvectors(:, 1); p2 = eigenvectors(:, 2); plot([0, p1(1)], [0, p1(2)], 'k-', 'linewidth', 3); hold on; plot([0, p2(1)], [0, p2(2)], 'r-', 'linewidth', 3);

  25. Visualizing the Eigenvectors black: v1 (eigenvalue = 2.02) red: v2 (eigenvalue = 77.2)Which of these two lines is betterto project our points to?The red line clearly would preserve more information. plot_points(points, 1); p1 = eigenvectors(:, 1); p2 = eigenvectors(:, 2); plot([0, p1(1)], [0, p1(2)], 'k-', 'linewidth', 3); hold on; plot([0, p2(1)], [0, p2(2)], 'r-', 'linewidth', 3);

  26. PCA Code function [average, eigenvectors, eigenvalues] = ... compute_pca(vectors) number = size(vectors, 2); % note that we are transposing twice average = [mean(vectors')]'; centered_vectors = zeros(size(vectors)); for index = 1:number centered_vectors(:, index) = vectors(:, index) - average; end covariance_matrix = centered_vectors * centered_vectors'; [eigenvectors eigenvalues] = eig( covariance_matrix); % eigenvalues is a matrix, but only the diagonal % matters, so we throw away the rest eigenvalues = diag(eigenvalues); [eigenvalues, indices] = sort(eigenvalues, 'descend'); eigenvectors = eigenvectors(:, indices);

  27. PCA Projection of 2D Points to 1D • The compute_pca function shows how to compute : • All eigenvectors and corresponding eigenvalues. • The average of all vectors in our dataset. • Important: note that the eigenvectors returned by compute_pcaare sorted in decreasing order of their eigenvalues. • Suppose that we have applied this function to our 2D point dataset. • How can we get the top eigenvector from the result? • The top eigenvector is simply the first column of eigenvectors (the second return value).

  28. PCA Projection of 2D Points to 1D • Suppose that we have computed the eigenvectors, and now we want to project our 2D points to 1D numbers. • Suppose that P1 is the first eigenvector (i.e., the eigenvector with the highest eigenvalue). • Projection: P(V) = <V-avg, P1> = P1’ * (V – avg) • Dot product between (V-avg) and P1. • NOTE: The eigenvectors that Matlab returns have unit norm. • So, projection of a 2D vector to a 1D number is done in two steps: • First, centerthe vector by subtracting the average computed by compute_pca. • Second, take the dot product of the centered vector with the top eigenvector.

  29. Example: From 2 Dimensions to 1 • Our original set of 2D vectors.

  30. Example: From 2 Dimensions to 1 • We run compute_pca, and we compute the first eigenvector, which we call p1. • The black line shows the direction of p1.

  31. Example: From 2 Dimensions to 1 • We choose a point v4 = [-1.556, 0.576]’. • Shown in red. • We will compute the PCA projection of v4.

  32. Example: From 2 Dimensions to 1 • centered_v4 = v4 – average. • Shown in cyan.

  33. Example: From 2 Dimensions to 1 • projection = p1' * (centered_v4); • result: projection = 1.43

  34. Example: From 2 Dimensions to 1 • Note: the projection is a single number. • One way to visualize this number is shown in pink: it is the point on the x axis with x=projection.

  35. Example: From 2 Dimensions to 1 • A more intuitive way is to show the projection of the centered point (shown in cyan) on the black line. • This point is: projection * p1 = p1' * (centered_v4) * p1

  36. Example: From 2 Dimensions to 1 • b1 = projection * p1; • shown in red, on top of black line. • How are b1 and projection related?

  37. Example: From 1 Dimension to 2 • b1 = projection * p1; • shown in red, on top of black line. • projection = distance of b1 from the origin.

  38. Backprojection • In the previous slides we saw how to compute the projection P(V) from 2D to 1D:P(V) = <V-avg, P1> = P1’ * (V – avg) • Another useful operation is the backprojection: • In backprojection, we are given P(V), and based on that we try to estimate V as best as we can.

  39. Backprojection • Obviously, it is impossible to estimate V with certainty given P(V). • In our example: • P(V) has how many dimensions? • V has how many dimensions?

  40. Backprojection • Obviously, it is impossible to estimate V with certainty given P(V). • In our example: • P(V) has how many dimensions? 1. • V has how many dimensions? 2. • An infinite number of points will project to P(V). • What backprojection gives us is the “best estimate”, that has the smallest squared error (averaged over all vectors V in our dataset). • The backprojection formula for our 2D to 1D example is:B(P(V)) = P1 * P(V) + average

  41. Backprojection from 1D to 2D • Input: P(V) = 1.43, which is just a number.

  42. Backprojection from 1D to 2D • Step 1: map this number on the line corresponding to the top eigenvector: b1 = P(V) * p1. • The result is the red point on the black line.

  43. Backprojection from 1D to 2D • Step 2: add to b1 the average of our dataset. • The result is shown in green.

  44. Backprojection from 1D to 2D • Step 2: add to b1 the average of our dataset. • The result is shown in green. • This is it, the green point is B(P(V)), it is our best estimate.

  45. Example Application: PCA on Faces • In this example, the data are face images, like:

  46. Example Application: PCA on Faces • Each image has size : • Therefore, each image is represented as a 775-dimensional vector.

  47. PCA on Faces • Motivation: If a face is a 31x25 window, we need 775 numbers to describe the face. • With PCA, we can store (approximately) the same information with much fewer number. • One benefit is that we can do much faster computations, using fewer numbers. • Another benefit is that PCA provides useful information for face detection and face recognition. • How? Using the backprojection error. • The backprojection error measures the sum-of-squares error between a vector V and the backprojection B(P(V)). • It shows how much of the information in V is lost by P(V).

  48. PCA vs Template Matching • If we use template matching to detect faces, what is the perfect face (easiest to be detected, gives the best score)? • How about PCA?

  49. PCA vs Template Matching • Template matching (assuming normalized correlation): • The template of the face is perfect. • The only other faces that are perfect are faces that, after we normalize for brightness and contrast), become equal to the normalized template. • This approach is very restrictive. • Out of all normalized images, only one would qualify as a “perfect face”.

  50. PCA vs Template Matching • Just to make it concrete: • As we said before, we have a face dataset where each face is a 775-dimensional vector. • Suppose that we use PCA to project each face to a 10-dimensional vector. • When we do face detection, for every image subwindow V that we consider, we compute its PCA projection P(V) to 10 dimensions. • Then what? Can you guess how we would compute a detection score? Can you guess what type of faces would give a perfect score?

More Related