550 likes | 771 Views
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction. IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, Stephen Lin Presented by meconin. Outline. Introduction Graph Embedding (GE)
E N D
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, Stephen Lin Presented by meconin
Outline • Introduction • Graph Embedding (GE) • Marginal Fisher Analysis (MFA) • Experiments • Conclusion and Future Work
Introduction • Dimensionality Reduction • Linear • PCA, LDA, are the two most popular due to simplicity and effectiveness • LPP, preserves local relationships in the data set, and uncovers its essential manifold structure
Introduction • Dimensionality Reduction • For nonlinear methods, ISOMAP, LLE, Laplacian Eigenmap are three algorithms have been developed recently • Kernel trick: • linear methods → nonlinear ones • performing linear operations on higher or even infinite dimensional by kernel mapping function
Introduction • Dimensionality Reduction • Tensor based algorithms • 2DPCA, 2DLDA, DATER
Introduction • Graph Embedding is a general framework for dimensionality reduction • With it’s linearization, kernelization, and tensorization, we have a unified view for understanding DR algorithms • The above-mentioned algorithms can all be reformulated with in it
Introduction • This paper show that GE can be used as a platform for developing new DR algorithms • Marginal Fisher Analysis (MFA) • Overcome the limitations of LDA
Introduction • LDA (Linear Discriminant Analysis) • Find the linear combination of features best separate classes of objects • Number of available projection directions is lower than class number • Based upon interclass and intraclass scatters, optimal only when the data of each class is approximately Gaussian distributed
Introduction • MFA advantage: (compare with LDA) • The number of available projection directions is much larger • No assumption on the data distribution, more general for discriminant analysis • The interclass margin can better characterize the separability of different classes
Graph Embedding • For classification problem, the sample set is represented as a matrix X = [x1, x2, …, xN], xi Rm • In practice, the feature dimension m is often very high, thus it’s necessary to transform the data to a low-dimensional oneyi = F(xi), for all i
Graph Embedding • Different motivations of DR algorithms, their objectives are similar – to derive lower dimensional representation • Can we reformulate them within a unifying framework?Whether the framework assists design new algorithms?
Graph Embedding • Give a possible answer • Represent each vertex of a graph as a low-dimensional vector that preserves similarities between the vertex pairs • The similarity matrix of the graph characterizes certain statistical or geometric properties of the data set
Graph Embedding • G = { X, W } be an undirected weighted graph with vertex set X and similarity matrix W RNN • The diagonal matrix D and the Laplacian matrix L of a graph G are defined as L = D W, Dii = , i
Graph Embedding • Graph embedding of G is an algorithm to find low-dimensional vector representations relationships among the vertices of G • B is the constraint matrix, and d is a constant, for avoid trivial solution
Graph Embedding • For larger similarity between samples xi and xj, the distance between yi and yj should be smaller to minimize the objective function • To offer mappings for data points throughout the entire feature space • Linearization, Kernelization, Tensorization
Graph Embedding • LinearizationAssuming y = XTw • Kernelization: x F, assuming
Graph Embedding • The solutions are obtained by solving the generalized eigenvalue decomposition problem • F. Chung, “Spectral Graph Theory,” Regional Conf. Series in Math.,no. 92, 1997
Graph Embedding • Tensor • the extracted feature from an object may contain higher-order structure • Ex: • an image is a second-order tensor • sequential data such as video sequences is a third-order tensor
Graph Embedding • Tensor • In n dimensional space, nr directions, r is the rank(order) of a tensor • For tensor A, B Rm1m2…mnthe inner product
Graph Embedding • Tensor • For a matrix U Rmkm’k, B = A kU
Graph Embedding • The objective funtion: • In many case, there is no closed-form solution, but we can obtain the local optimum by fixing the projection vector
General Framework for DR • The differences of DR algorithms: • the computation of the similarity matrix of the graph • the selection of the constraint matrix
General Framework for DR • PCA • seeks projection directions with maximal variances • it finds and removes the projection direction with minimal variance
General Framework for DR • KPCA • applies the kernel trick on PCA, hence it is a kernelization of graph embedding • 2DPCA is a simplified second-order tensorization of PCA and only optimizes one projection direction
General Framework for DR • LDA • searches for the directions that are most effective for discrimination by minimizing the ratio between the intraclass and interclass scatters
General Framework for DR • LDA
General Framework for DR • LDA • follows the linearization of graph embedding • the intrinsic graph connects all the pairs with same class labels • the weights are in inverse proportion to the sample size of the corresponding class
General Framework for DR • The intrinsic graph of PCA is used as the penalty graph of LDA PCA LDA
General Framework for DR • KDA is the kernel extension of LDA • 2DLDA is the second-order tensorization of LDA • DATER is the tensorization of LDA in arbitrary order
General Framework for DR • LLP • ISOMAP • LLE • Laplacian Eigenmap (LE)
Related Works • Kernel Interpretation • Ham et al. • KPCA, ISOMAP, LLE, LE share a common KPCA formulation with different kernel definitions • Kernel matrix v.s Laplacian matrix from similarity matrix • Only unsupervised v.s more general
Related Works • Out-of-Sample Extension • Brand • Mentioned the concept of graph embedding • Brand’s work can be considered as a special case of our graph embedding
Related Works • Laplacian Eigenmap • Work with only a single graph, i.e., the intrinsic graph, and cannot be used to explain algorithms such as ISOMAP, LLE, and LDA • Some works use a Gaussian function to compute the nonnegative similarity matrix
Marginal Fisher Analysis • Marginal Fisher Analysis
Marginal Fisher Analysis • Intraclass compactness (intrinsic graph)
Marginal Fisher Analysis • Interclass separability (penalty graph)
Marginal Fisher Analysis • Intraclass compactness (intrinsic graph)
Marginal Fisher Analysis • Interclass separability (penalty graph)
LDA v.s MFA • The available projection directions are much greater than that of LDA • There is no assumption on the data distribution of each class • The interclass margin in MFA can better characterize the separability of different classes than the interclass variance in LDA
Kernel MFA • The distance between two samples • For a new data point x, its projection to the derived optimal direction
Experiments • Face Recognition • XM2VTS, CMU PIE, ORL • A Non-Gaussian Case
Experiments • XM2VTS, PIE-1, PIE-2, ORL