1 / 18

Dimensionality R e d u c t i o n

Dimensionality R e d u c t i o n. Another unsupervised task. Clustering, etc. -- all forms of data modeling Trying to identify statistically supportable patterns in data Another way of looking at it: reduce complexity of data Clustering: 1000 data points → 3 clusters

havyn
Download Presentation

Dimensionality R e d u c t i o n

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DimensionalityReduction

  2. Another unsupervised task • Clustering, etc. -- all forms of data modeling • Trying to identify statistically supportable patterns in data • Another way of looking at it: reduce complexity of data • Clustering: 1000 data points → 3 clusters • Dimensionality reduction: reduce complexity of space in which data lives • Find low-dimensional projection of data

  3. Objective functions • All learning methods depend on optimizing some objective function • Otherwise, can’t tell if you’re making any progress • Measures whether model A is better than B • Supervised learning: loss function • Difference between predicted and actual values • Unsupervised learning: model fit/distortion • How well does model represent data?

  4. The fit of dimensions • Given: Data set X={X1,...,XN} in feature space F • Goal: find a low-dimensional representation of data set • Projection of X into F’⊂F • That is: find g() such that g(X)∈F’ • Constraint: preserve some property of X as much as possible

  5. Capturing classification • Easy “fit” function: keep aspects of data that make it easy to classify • Uses dimensionality reduction in conjunction with classification • Goal: find g() such that loss of model learned on g(X) is minimized:

  6. Feature subset selection • Early idea: • Let g() be a subset of the feature space • E.g., if X=[X[1], X[2], ..., X[d]] • Then g(X)=[X[2], X[17], ..., X[k]] for k≪d • Tricky part: picking the indices to keep • Q: How many such index sets are possible?

  7. Wrapper method • Led to wrapper method for FSS • Kohavi et al. (KDD-1995, AIJ 97(1-2), etc.) • Core idea: use target learning algorithm as black-box subroutine • Wrap (your favorite) search for feature subset around black box

  8. An example wrapper FSS • // hill-climbing search-based wrapper FSS • function wrapper_FSS_hill(X,Y,L,baseLearn) • // Inputs: data X, labels Y, loss function L, • // base learner method, baseLearn() • // Outputs: feature subset S, model fHat • S={} // initialize: empty set • [XTr,YTr,Xtst,Ytst]=split_data(X,Y); • l=Inf; • do { • lLast=l; • nextSSet=extend_feature_set(S); • foreach sp in nextSSet { • model=baseLearn(Xtr[sp],Ytr); • err=L(model(Xtst),Ytst); • if (err<l) { • l=err; • fHat=model; • } • } • } while (l<lLast);

  9. More general projections • FSS uses orthagonal projection onto a subspace • Essentially: drop some dimensions, keep others • Often useful to work with more general projection functions, g() • Example: linear projection: • Pick A to reduce dimension: k×d matrix, k≪d

  10. The right linearity • How to pick A? • What property of the data do we want to preserve? • Typical answer: squared-error between the original data point and the low-dimensional representation of that point: • Leads to method of principle component analysis (PCA), a.k.a., Karhunen-Loéve (KL) transform

  11. PCA • Find mean of data:

  12. PCA • Find mean of data: • Find scatter matrix: • Essentially, denormalized covariance matrix

  13. PCA • Find mean of data: • Find scatter matrix: • Essentially, denormalized covariance matrix • Find eigenvectors/eigenvalues of S:

  14. PCA • Find mean of data: • Find scatter matrix: • Essentially, denormalized covariance matrix • Find eigenvectors/eigenvalues of S: • Take top k<<d eigenvectors:

  15. PCA • Find mean of data: • Find scatter matrix: • Essentially, denormalized covariance matrix • Find eigenvectors/eigenvalues of S: • Take top k<<d eigenvectors: • Form A from those vectors:

  16. Nonlinearity • The coolness of PCA: • Finds directions of “maximal variance” in data • Good for linear data sets • The downfall of PCA: • Lots of stuff in the world is nonlinear

  17. LLE et al. • Leads to a number of methods for nonlinear dimensionality reduction (NLDR) • LLE, Isomap, MVUE, etc. • Core idea to all of them: • Look at small “patch” on surface of data manifold • Make low-dim local, linear approximation to patch • “Stitch together” all local approximations into global structure

  18. Unfolding the swiss roll 3-d data 2-d approximation

More Related