Packing to fewer dimensions

Packing to fewer dimensions Paolo Ferragina Dipartimento di Informatica Università di Pisa

Speeding up cosine computation • What if we could take our vectors and “pack” them into fewer dimensions (say 50,000100) while preserving distances? • Now, O(nm) to compute cos(d,q) for all n docs • Then, O(km+kn) where k << n,m • Two methods: • “Latent semantic indexing” • Random projection

Briefly • LSI is data-dependent • Create a k-dim subspace by eliminating redundant axes • Pull together “related” axes – hopefully • car and automobile • Random projection is data-independent • Choose a k-dim subspace that guarantees good stretching properties with high probability between any pair of points. What about polysemy ?

Sec. 18.4 Latent Semantic Indexing courtesy of Susan Dumais

Example Notions from linear algebra • Matrix A, vector v • Matrix transpose (At) • Matrix product • Rank • Eigenvalues l and eigenvector v: Av = lv

Overview of LSI • Pre-process docs using a technique from linear algebra called Singular Value Decomposition • Create a new (smaller) vector space • Queries handled (faster) in this new space

Singular-Value Decomposition • Recall mn matrix of terms  docs, A. • A has rank r  m,n • Define term-term correlation matrix T=AAt • T is a square, symmetric mm matrix • Let U be mrmatrix of r eigenvectors of T • Define doc-doc correlation matrix D=AtA • D is a square, symmetric nn matrix • Let V be nrmatrix of r eigenvectors of D

A’s decomposition • Given U(for T, mr) and V(for D, nr) formed by orthonormal columns (unit dot-product) • It turns out that A = U SVt • WhereS is a diagonal matrix with the eigenvalues of T=AAt in decreasing order. mn mr rn = rr Vt S U A

document k k 0 k 0 0 useless due to 0-col/0-row of Sk Dimensionality reduction • Fix some k << r, zeroout all but the k biggest eigenvalues in S[choice of k is crucial] • Denote by Sk this new version of S, having rank k • Typically k is about 100, while r (A’s rank) is > 10,000 = r Vt S Sk A U Ak k x n r x n m x r m x k

A running example

Guarantee • Akis a pretty good approximation to A: • Relative distances are (approximately) preserved • Of all mn matrices of rank k, Ak is the best approximation to A wrt the following measures: • minB, rank(B)=k ||A-B||2 = ||A-Ak||2 = sk+1 • minB, rank(B)=k ||A-B||F2 = ||A-Ak||F2 = sk+12+ sk+22+...+ sr2 • Frobenius norm ||A||F2 = s12+ s22+...+ sr2

U,V are formed by orthonormal eigenvectors of the matrices D,T Reduction • Since we are interested in doc/doc correlation, we consider: • D=AtA=(U SV t)t(U SV t) = (SV t)t (SV t) • Hence X = S Vt is a matrix r x n, may play the role of A • To reduce its size we set Xk = Sk Vt is a matrix k x nand thus get AtA Xkt Xk (both are n x n matrices) • We use Xkto define how to project A: • Since Xk= SkVkt  Xk= Ukt A (use def of SVD of A) • Since Xkmay play role of A, its cols are proj. docs • Similarly Q can be interpreted as a new col of A and thus it is enough to multiply Ukt times Q to get the projected query, O(km) time

Which are the concepts ? • c-th concept = c-th col of Uk(which is m x k) • Uk[i][c] = strength of association between c-th concept and i-th term • Vtk[c][j] = strength of association between c-th concept and j-th document • Projected document: d’j=Utkdj • d’j [c] = strenght of concept c in dj • Projected query: q’= Utkq • q’[c] = strenght of concept c in q

Random Projections Paolo Ferragina Dipartimento di Informatica Università di Pisa Slides only !

An interesting math result Lemma (Johnson-Linderstrauss, ‘82) Let P be a set of n distinct points in m-dimensions. Given e > 0, there exists a function f : P  IRk such that for every pair of points u,v in P it holds: (1 - e) ||u - v||2 ≤ ||f(u) – f(v)||2 ≤(1 + e) ||u-v||2 Where k = O(e-2 log n) f() is called JL-embedding Setting v=0 we also get a bound on f(u)’s stretching!!!

What about the cosine-distance ? f(u)’s, f(v)’s stretching substituting formula above for ||u-v||2

E[pi,j] = 0 Var[pi,j] = 1 How to compute a JL-embedding? If we set the projection matrix P = pi,j as a random m x k matrix, where its components are independent random variables with one of the following two distributions: 2

Finally... • Random projections hide large constants • k  (1/e)2 * log n, so k may be large… • it is simple and fast to compute • LSI is intuitive and may scale to any k • optimal under various metrics • but costly to compute, do exist good libraries

Packing to fewer dimensions

Packing to fewer dimensions

Presentation Transcript

Dimensions

fewer

Fewer Uninsured Americans

Relaxing to Three Dimensions

Five Dimensions to Professionalism

DIMENSIONS: Dimensions of size Dimensions of position

Dimensions

DIMENSIONS

DIMENSIONS

DIMENSIONS

Dimensions

DIMENSIONS

PACKING

Relaxing to Three Dimensions

Moving to three dimensions

Five Dimensions to Professionalism

Dimensions

Dimensions

Secrets to Minimalist Packing

Packing to fewer dimensions

Tips to Start Packing

Your Guide to Packing