Author: Hao Cheng, Kien A Hua, and Khanh Vu University of Central Florida ICTAI ’07

Author: Hao Cheng, Kien A Hua, and Khanh Vu University of Central Florida ICTAI ’07 Local and Global Structures Preserving Projection

Overview • Introduction • Proposed Algorithm • Experiments • Conclusions

Introduction • Data usually reside in a high dimensional space. • The intrinsic dimensionality of data is much lower. • Manifold learning • finds a low dimensional embedding of the raw data; and the embedding can well preserve the intrinsic structures of the data. • a recent popular research topic.

Related Work • Principal Component Analysis (PCA) • Local Preserving Projection (LPP) • Many others…

PCA • Principal Component Analysis (PCA) • PCA projects the data along a set of axes which exhibit greater variances than other axes; • PCA minimizes the distortion of all the pairwise distances of the data after the reduction. • PCA can well preserve the global structures of the data.

LPP • Local Preserving Projection (LPP) • LPP constructs a similarity matrix W: • If point i is the top K nearest neighbor of point j, then W(i,j) = W(j,i) = 1. Otherwise W(i,j) = 0. • W encodes local neighborhood information. • LPP finds a set of axes in order to minimize the pairwise distances of the data (indicated by W). • LPP can well preserve the neighborhoods.

Nonlinear Methods • Both PCA and LPP are linear methods. • Nonlinear methods: • ISOMAP, Locally Linear Embedding (LLE), Hessian LLE (HLLE), Local Tangent Space Alignment (LTSA), Diffusion Maps (DM). • Problems: • Computational intensive. • Do not scale well. • Performances are not very robust.

Motivation • PCA: global structure • LPP: local structure • Both global and local structures are important, and should be properly preserved! • Look at the toy examples.

PCA LPP Toy Example 1 • Two classes of data

LPP PCA Toy Example 2 • Two classes of data Neither of them does well!

LGSPP • Local and Global Structure Preserving Projection (LGSPP): • Extracts local and global structures; • Derives the embedding to preserve the structures. Minimizes the distortions.

Local Structure • For each data point x, • S(x) is the set of points include x itself and its Ks nearest neighbors (Ks is a system parameter). • S(x) is the local neighborhood around the point x.

Distance preserving (black dotted lines) can prevent space collapsing! point x Global Structure • For each data point x, • D(x) is the set of Kd points, which are far from point x and also far from each other (Kd is another parameter). • For example: Blue dot x; Red/Green dots in D(x). Points in D(x) and point x are from different dense regions.

Extraction Algorithm • Select a random sample set. • Pick the one farthest from point x, denoted as d1. • Pick the one which is farthest from x and d1, denoted as d2. • Continue till find Kd points.

S(x) and D(x) • S(x): local neighborhood of x. • D(x): point x and points in D(x) are highly likely from different dense regions in the dataset. • Local and global structures: • S(x) and D(x) for each point x.

Embedding • Goals of embedding: • Keep the points in S(x) close to each other in the reduced space: minimize the pairwise distances in S(x) • Keep the points in D(x) far from those in S(x) in the reduced space: maximize the pairwise distances between S(x) and D(x)

Optimization • find a set of projection axes pi: • Equivalent to:

Rewrite • Equivalent to: • Generalized Eigenvalue Problem.

Toy Examples Revisit • LGSPP

Synthetic datasets • 2-dimensional data. • : free variable, from -1 to 1. • 1st dimension: • 2nd dimension:

More datasets • LGSPP

Conclusions • LGSPP: • Extracts local and global structures. • Computes a salient embedding. • LGSPP: • Address the limitations of PCA and LPP. • Linear, fast, robust. • Works well on both synthetic and real-world examples.

Questions?

Author: Hao Cheng, Kien A Hua, and Khanh Vu University of Central Florida ICTAI ’07