Dimensionality Reduction

Dimensionality Reduction

Random Projections • Johnson-Lindenstrauss lemma • For: • 0< e < 1/2, • any (sufficiently large) set S of M points in Rn • k = O(e-2lnM) • There exists a linear map f:SRk, such that (1- e) ||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ e)||u-v||2 for u,v in S • Random projection is good with constant probability

Random Projection • Set k = O(e-2lnM) • Select k random n-dimensional vectors • (an approach is to select k gaussian distributed vectors with variance 0 and mean value 1: N(1,0) ) • Project the original points into the k vectors. • The resulting k-dimensional space approximately preserves the distances with high probability

“Database friendly” RP • Achlioptas showed that it is possible to do random projections with the same guarantees using only {1, -1} or {1, 0, -1} • Thus you need to do only additions and subtractions, not multiplications

Theorem Let P a set of n points in Rd, stored as a n x d matrix A. Given e, b >0, let For integer k > k0 let R be a d x k matrix with R(i, j) = {rij}, with elements that are generated randomly and independently from the following distribution: +1 with probability 1/2 rij = -1 with probability 1/2

Let and let f: Rd Rk With probability at least 1-n-b, for all u, v in P (1- e) ||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ e)||u-v||2

The same is true if you use: +1 with probability 1/6 rij = 0 with probability 2/3 -1 with probability 1/6

The proof is similar to previous in spirit, but differs in details. Again, we need to show that the length of a vector concentrates around its mean value in the projected space. We show that the worst case vectors for this projection matrix are the vectors: and that the even moments of the projection of these vectors are dominated by the corresponding moments of the spherically symmetric projections.

Dimensionality Reduction