220 likes | 434 Views
Chap. 19 PCA Principal Component Analysis Multivariate Analysis. Example: Red Sox. Dataset: 110 Years of Redsox Performance Data Question: Pitchers and Batters Ages Matter for Performance. Transform. Linear Transformation ( x,y ) in Cartesian coordinate
E N D
Chap. 19 PCAPrincipal Component Analysis Multivariate Analysis
Example: Red Sox • Dataset: 110 Years of Redsox Performance Data • Question: Pitchers and Batters Ages Matter for Performance
Transform • Linear Transformation • (x,y) in Cartesian coordinate • The same point becomes in (a,b) in another coordinate system • Assuming linear transformation • a = f(x, y) = x*c11 + y*c12 • b = g(x,y) = x*c21 + y*c22 • = • For review of matrix, www.cs.uml.edu/~kim/580/review_matrix.pdf
Eigenvector • = • = = 4* • Eigenvector – projection to the same coordinate • Eigenvectors of a square matrix are orthogonal • Unique eigenvalues are associated with eigenvectors
Redundancy • Arbitrary observations by r1 and r2 • Low to high redundancies from (a) to (c) • (c) can be represented by a single variable • Spread across the best-fit line – covariance between two variables
Transform = New Coordinate • What should be good for transform matrix for PCA ? • Covariance
Mean, Variance, Covariance • X = (x1, x2,…..xn) Y = (y1, y2,…..yn) • E[X] = ∑i xi /n E[Y] = ∑I yi /n • Variance = (st. dev.)2: • V[X] = ∑I(xi – E[X])2 / (n-1) • Covariance -- • cov[X,Y] = ∑I(xi – E[X]) (yi – E[Y]) / (n-1)
Covariance Matrix • Three variables X,Y,Z • cov[x,x] cov[x,y] cov[x,z] • cov[y,x] cov[y,y] cov[y,z] • cov[z,x] cov[z,y] cov[z,z] • cov[X,X] = V[X] • cov[X,Y] = cov{Y,X]
PCA, Multivariate Analysis • Principal Component Analysis (PCA) • Coordinate transformation • Projection
PCA of AA’s • How to incorporate different properties • In order to group similar AA’s • Visual clustering with Volume and pI
PCA • Given NxP matrix (e.g., 20x7), • Each row represents a p-dimensional data point • Each data point is • Scaled and shifted to the origin • Rotated to spread out points as much as possible • Scaling • For property j, compute the average and the s.d. • μj = ∑i xij /N, σj2 = ∑i(xij - μj)2 /N • Since each property has a different scales and means, define normalized variables, • zij = (xij - μj) /σj • zij measures the deviation from the mean for each property with the mean of 0 and s.d. of 1
PCA • New orthogonal coordinate system • Find vj = (vj1, vj2 ,…, vjP) such that • ∑k vikvjk= 0 for i ≠ j (orthogonal) and ∑k vjk2= 0 (unit length) • vj represents new coordinate vector • Data points in z-coordinate becomes • yij = ∑k zjk vik • New y coordinate systems is a rotation of the z coordinate system • vjk turns out to be related to the correlation coefficient
PCA • Correlation coefficient, Cij • Cij = ∑k(zik - mi)(zjk - mj) /Psisj (mi, si mean and s.d. of the i-th row) • -1 ≤ Cij ≤ 1 • Results in NxNsimiarlity matrix, Sij
Example: similarities • Pixels with similar spectral responses fall into similar locations in score plot
Example: Cancer region 1. Inspect reflectance polarization image for known tumor locations Pixels projected onto PC1 vs. PC2 • Select several points in this • region to display how cancerous tissue trends in the score space PC1 vs. PC3