240 likes | 248 Views
차원감소방법의 개요. 박 정 희 충남대학교 컴퓨터공학과. Outline. dimension reduction dimension reduction methods Linear dimension reduction Nonlinear dimension reduction Graph-based dimension reduction Applications. Dimension reduction. features. dimenion reduction ( 차원감소 ). feature extraction ( 특징추출 )
E N D
차원감소방법의 개요 박 정 희 충남대학교 컴퓨터공학과
Outline • dimension reduction • dimension reduction methods • Linear dimension reduction • Nonlinear dimension reduction • Graph-based dimension reduction • Applications
Dimension reduction features dimenion reduction (차원감소) • feature extraction • (특징추출) • feature selection • (특징선택) objects • dimension: the number of features
Dimension reduction • Reduce the dimensionality of high dimensional data • Identify new meaningful underlying features • The computational overhead of the subsequent processing stages is reduced • Reduce noise effects • Visualization of the data
Dimension reduction methods • PCA(principal component analysis) • LDA(Linear discriminant analysis) • LLE(Locally linear embedding) • Isomap • LPP(Locality preserving projection) • UDP(Unsupervised discriminant projection) • Kernel based nonlinear dimension reduction • …….
Linear dimension reduction W: a matrix of size mxp Rm Rp a Wta • How to find p column vectors of W? - use a training data set - what is objective criteria? • Traditional linear dimension reduction methods - PCA(principal component analysis) - LDA(Linear discriminant analysis) (p << m) =
Principal component analysis • Minimize information loss by dimension reduction • Capture as much of variability as possible
PCA (or Karhunen-Loéve transform) • given data set {a1,┉,an } , centroid • total scatter matrix • Find a projection vector w to maximize • Find the eigenvector w corresponding to the • largest eigenvalue of St : • eigenvectors w1, … ,wpcorresponding to the p largest • eigenvalues of St -> reduction to p-dimensional space
Linear Discriminant analysis(LDA) • PCA seeks directions which are efficient for representing data • LDA seeks direction which are useful for discriminating between data in different classes
LDA • given data set {a1,┉,an} with class label information , global centroid class centroid • Within-class • scatter matrix • Between-class • scatter matrix • Maximize between-class scatter and • minimize within-class scatter • Solve the generalized eigenvalue problem
Nonlinear dimension reduction • It is difficult to represent nonlinearly structured data by linear dimension reduction
Locally linear embedding(LLE) • Expectation: each data point and its neighbors lie on a locally linear patch of the manifold • [reference] Nonlinear dimensionality reduction by locally linear embedding, A.T. Roweis and L.K. Saul, Science, vol.290, pp.2323-2326, 2000
LLE • Algorithm • Find the nearest neighbors of each data point • Express each data point xi as a linear combination of its neighbors wij = 0 if xj is not a near neighbor of xi 3. Find the coordinates yi of each point xi in lower-dimensional space by using the weights wij found in step 2 minimize
Isomap • Preserve intrinsic geometry of the data, as captured in the geodesic manifold distances between all pairs of data points • [reference] A Global Geometric Framework for Nonlinear Dimensionality Reduction, J. B. Tenenbaum, V. de Silva and J. C. Langford, Science, vol.290, pp.2319-2323, 2000
Isomap • Algorithm • Find the nearest neighbors of each data point and create a weighted graph by connecting a point to its nearest neighbors • Redefine the distances between points to be the length of the shortest path between the two points • Apply classical MDS(Multidimensional scaling)
Kernel-based dimension reduction F x12 2 x1x2 x22 linear dimension reduction F(x) = How to define a nonlinear mapping F?
Kernel methods • If a kernel function k(x,y) satisfies Mercer’s condition, • then there exists a mapping A (A) < x, y > < (x), (y)>= k(x,y) • Gaussian kernel • Polynomial kernel k(x,y) = (<x,y>+ß)d • As long as an algorithm can be written in terms of • inner products, it can be performed on the feature space
Nonlinear discriminant analysis • As long as an algorithm can be written in terms of inner products, it can be performed on the feature space • In LDA, • LDA written in terms of inner products
Graph-based dimension reduction • Graph G=<X, W> X={xi, 1 i n} : n nodes of data points W={ wij }1 i,j n : similarity (or weight) matrix W can be sparse by using -neighborhoods or k nearest neighbors for edge construction • Let yi be the embedding of xi • [LPP(Locality preserving projection)] ensures that if xi and xj are close then yi and yjare close as well minimize
Graph-based methods • Let yi be the embedding of xi minimize • Linearization: minimize • Using a penalty graph <X, P=(pij )> maximize
Applications: Face recognition ? … … dimension reduction … … … … high data dimensionality
Applications: text mining • Each document becomes a `term' vector, • each term is a component (attribute) of the vector, • the value of each component is the number of times the corresponding term occurs in the document. • Document data has high dimensionality
Reference • Introduction to data mining, P.Tan, M. Steinbach and V. Kumar, Addison Wesley, 2006 • Pattern classification, R.Duda, P.Hart and D. Stork, Wiley-interscience, 2001 • [LPP] Locality preserving projections, X.He and P.Niyogi, Proc. conf. Neural information processing systems, 2003 • Graph embedding and extensions: a general framework for dimensionality reduction, S. Yan, D. Xu, H. Zhang, Q. Yang and S. Lin, IEEE transactions on pattern analysis and machine intelligence, Vol. 29(1), 2007