Dynamic graphics, Principal Component Analysis

Dynamic graphics, Principal Component Analysis Ker-Chau Li UCLA department of Statistics

Xlisp-stat (demo) • (plot-points x y) • (scatterplot-matrix (list x y z u w)) • (spin-plot (list x y z)) • Link, remove, select, rescale • Examples : • (1) simulated data • (2) Iris data • (3) Boston Housing data

PCA(principal component analysis) • A fundamental tool for reducing dimensionality by finding projections with largest variance • (1)Data version • (2) Population version • Each has a number of variations • (3) Let’s begin with an illustration using • (pca-model (list x y z))

Find a 2-D plane in 4-D space • Generate 100 cases of u from uniform(0,1) • Generate 100 cases of v from uniform(0,1) • Define x = u + v, y= u-v, • Apply PCA-model to (x, y,u,v); demo. • It still works with small errors (e ~N(0,1)) present: • x = u + v + .01 e_1 ; y=u - v +.01e_2 • Define x = u + v^2 , y= u - v^2, z = v^2 • Apply PCA-model to (x, y, z, u); works fine • But not so well with Nonlinear manifold; try • ( pca-model (list x y u v))

Other examples • 1-D from 2-D • rings • Ying and Yang

Data version • 1. Construct the sample variance-covariance matrix • 2. Find the eigenvectors • 3. Projection : use each eigenvector to form a linear combination of original variables • 4. The larger, the better : the k-th principal component is the projection with the k-th largest eigenvalue

Data Version(alternative view) • 1-D data matrix : rank 1 • 2-D data matrix :rank 2 • K-D data matrix : rank k • Eigenvectors for 1-D sample covariance matrix: rank 1 • Eigenvectors for 2-D sample covariance matrix: rank 2 • Eigenvectors for k-D sample matrix • Adding i.i.d. noise • Connection with automatic basis curve finding (to be discussed later)

Population version • Let the sample size tend to the infinity • Sample covariance-matrix converges to a matrix which is the population covariance-matrix (due to law of large number) • The rest of steps remain the same • We shall use the population version for theoretical discussion

Some Basic facts • Variance of linear combination of random variables • var(a x + b y)= a^2 var(x) + b^2 var(y) + 2 a b cov(x,y) • Easier if using matrix representation : • (B.1) var ( m’ X)= m’ Cov(X) m • here m is a p-vector, X consists of p random variables (x_1, …,x_p)’ • From (B.1), it follows that

Basic facts (Cont.) • Maximizing var(m’x) subject to ||m||=1 is the same as Max m’cov(X)m subject to ||m||=1 • (here ||m|| denotes the length of the vector m) • Eigenvalue decomposition : • (B.2) M vi = i vi, where • 1 ≥ 2 ≥ …. ≥ p • Basic linear algebra tells us that the first eigenvector will do : • Solution of max m’ M m subject to ||m||=1 must satisfy M m= 1 m

Basic facts(cont.) • Covariance matrix is degenerated (I.e, some eigenvalues are zero) if data are confined to a lower dimensional space S • Rank of covariance matrix = number of non-zero eigenvalues = dim. of the space S • This explain why pca works for our first example • Why small errors can be tolerated ? • Large i.i.d. errors are fine too • Heterogeneity is harmful, correlated errors too

Further discussion • No guarantee of finding nonlinear structure like clusters , curves, etc. • In fact, sampling properties for pca are mostly developed for normal data • Still useful • Scaling problem • Projection pursuit: guided; random

Dynamic graphics, Principal Component Analysis

Dynamic graphics, Principal Component Analysis

Presentation Transcript

Principal Component Analysis

Principal component analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis