230 likes | 405 Views
Author: Hao Cheng, Kien A Hua, and Khanh Vu University of Central Florida ICTAI ’07. Local and Global Structures Preserving Projection. Overview. Introduction Proposed Algorithm Experiments Conclusions. Introduction. Data usually reside in a high dimensional space.
E N D
Author: Hao Cheng, Kien A Hua, and Khanh Vu University of Central Florida ICTAI ’07 Local and Global Structures Preserving Projection
Overview • Introduction • Proposed Algorithm • Experiments • Conclusions
Introduction • Data usually reside in a high dimensional space. • The intrinsic dimensionality of data is much lower. • Manifold learning • finds a low dimensional embedding of the raw data; and the embedding can well preserve the intrinsic structures of the data. • a recent popular research topic.
Related Work • Principal Component Analysis (PCA) • Local Preserving Projection (LPP) • Many others…
PCA • Principal Component Analysis (PCA) • PCA projects the data along a set of axes which exhibit greater variances than other axes; • PCA minimizes the distortion of all the pairwise distances of the data after the reduction. • PCA can well preserve the global structures of the data.
LPP • Local Preserving Projection (LPP) • LPP constructs a similarity matrix W: • If point i is the top K nearest neighbor of point j, then W(i,j) = W(j,i) = 1. Otherwise W(i,j) = 0. • W encodes local neighborhood information. • LPP finds a set of axes in order to minimize the pairwise distances of the data (indicated by W). • LPP can well preserve the neighborhoods.
Nonlinear Methods • Both PCA and LPP are linear methods. • Nonlinear methods: • ISOMAP, Locally Linear Embedding (LLE), Hessian LLE (HLLE), Local Tangent Space Alignment (LTSA), Diffusion Maps (DM). • Problems: • Computational intensive. • Do not scale well. • Performances are not very robust.
Motivation • PCA: global structure • LPP: local structure • Both global and local structures are important, and should be properly preserved! • Look at the toy examples.
PCA LPP Toy Example 1 • Two classes of data
LPP PCA Toy Example 2 • Two classes of data Neither of them does well!
LGSPP • Local and Global Structure Preserving Projection (LGSPP): • Extracts local and global structures; • Derives the embedding to preserve the structures. Minimizes the distortions.
Local Structure • For each data point x, • S(x) is the set of points include x itself and its Ks nearest neighbors (Ks is a system parameter). • S(x) is the local neighborhood around the point x.
Distance preserving (black dotted lines) can prevent space collapsing! point x Global Structure • For each data point x, • D(x) is the set of Kd points, which are far from point x and also far from each other (Kd is another parameter). • For example: Blue dot x; Red/Green dots in D(x). Points in D(x) and point x are from different dense regions.
Extraction Algorithm • Select a random sample set. • Pick the one farthest from point x, denoted as d1. • Pick the one which is farthest from x and d1, denoted as d2. • Continue till find Kd points.
S(x) and D(x) • S(x): local neighborhood of x. • D(x): point x and points in D(x) are highly likely from different dense regions in the dataset. • Local and global structures: • S(x) and D(x) for each point x.
Embedding • Goals of embedding: • Keep the points in S(x) close to each other in the reduced space: minimize the pairwise distances in S(x) • Keep the points in D(x) far from those in S(x) in the reduced space: maximize the pairwise distances between S(x) and D(x)
Optimization • find a set of projection axes pi: • Equivalent to:
Rewrite • Equivalent to: • Generalized Eigenvalue Problem.
Toy Examples Revisit • LGSPP
Synthetic datasets • 2-dimensional data. • : free variable, from -1 to 1. • 1st dimension: • 2nd dimension:
More datasets • LGSPP
Conclusions • LGSPP: • Extracts local and global structures. • Computes a salient embedding. • LGSPP: • Address the limitations of PCA and LPP. • Linear, fast, robust. • Works well on both synthetic and real-world examples.