120 likes | 222 Views
Dynamic graphics, Principal Component Analysis. Ker-Chau Li UCLA department of Statistics. Xlisp-stat (demo). (plot-points x y) (scatterplot-matrix (list x y z u w)) (spin-plot (list x y z)) Link, remove, select, rescale Examples : (1) simulated data (2) Iris data
E N D
Dynamic graphics, Principal Component Analysis Ker-Chau Li UCLA department of Statistics
Xlisp-stat (demo) • (plot-points x y) • (scatterplot-matrix (list x y z u w)) • (spin-plot (list x y z)) • Link, remove, select, rescale • Examples : • (1) simulated data • (2) Iris data • (3) Boston Housing data
PCA(principal component analysis) • A fundamental tool for reducing dimensionality by finding projections with largest variance • (1)Data version • (2) Population version • Each has a number of variations • (3) Let’s begin with an illustration using • (pca-model (list x y z))
Find a 2-D plane in 4-D space • Generate 100 cases of u from uniform(0,1) • Generate 100 cases of v from uniform(0,1) • Define x = u + v, y= u-v, • Apply PCA-model to (x, y,u,v); demo. • It still works with small errors (e ~N(0,1)) present: • x = u + v + .01 e_1 ; y=u - v +.01e_2 • Define x = u + v^2 , y= u - v^2, z = v^2 • Apply PCA-model to (x, y, z, u); works fine • But not so well with Nonlinear manifold; try • ( pca-model (list x y u v))
Other examples • 1-D from 2-D • rings • Ying and Yang
Data version • 1. Construct the sample variance-covariance matrix • 2. Find the eigenvectors • 3. Projection : use each eigenvector to form a linear combination of original variables • 4. The larger, the better : the k-th principal component is the projection with the k-th largest eigenvalue
Data Version(alternative view) • 1-D data matrix : rank 1 • 2-D data matrix :rank 2 • K-D data matrix : rank k • Eigenvectors for 1-D sample covariance matrix: rank 1 • Eigenvectors for 2-D sample covariance matrix: rank 2 • Eigenvectors for k-D sample matrix • Adding i.i.d. noise • Connection with automatic basis curve finding (to be discussed later)
Population version • Let the sample size tend to the infinity • Sample covariance-matrix converges to a matrix which is the population covariance-matrix (due to law of large number) • The rest of steps remain the same • We shall use the population version for theoretical discussion
Some Basic facts • Variance of linear combination of random variables • var(a x + b y)= a^2 var(x) + b^2 var(y) + 2 a b cov(x,y) • Easier if using matrix representation : • (B.1) var ( m’ X)= m’ Cov(X) m • here m is a p-vector, X consists of p random variables (x_1, …,x_p)’ • From (B.1), it follows that
Basic facts (Cont.) • Maximizing var(m’x) subject to ||m||=1 is the same as Max m’cov(X)m subject to ||m||=1 • (here ||m|| denotes the length of the vector m) • Eigenvalue decomposition : • (B.2) M vi = i vi, where • 1 ≥ 2 ≥ …. ≥ p • Basic linear algebra tells us that the first eigenvector will do : • Solution of max m’ M m subject to ||m||=1 must satisfy M m= 1 m
Basic facts(cont.) • Covariance matrix is degenerated (I.e, some eigenvalues are zero) if data are confined to a lower dimensional space S • Rank of covariance matrix = number of non-zero eigenvalues = dim. of the space S • This explain why pca works for our first example • Why small errors can be tolerated ? • Large i.i.d. errors are fine too • Heterogeneity is harmful, correlated errors too
Further discussion • No guarantee of finding nonlinear structure like clusters , curves, etc. • In fact, sampling properties for pca are mostly developed for normal data • Still useful • Scaling problem • Projection pursuit: guided; random