1 / 80

Similarities, Distances and Manifold Learning

Similarities, Distances and Manifold Learning. Prof. Richard C. Wilson Dept. of Computer Science University of York. Background. Typically objects are characterised by features Face images SIFT features Object spectra ... If we measure n features → n -dimensional space

prisca
Download Presentation

Similarities, Distances and Manifold Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York

  2. Background • Typically objects are characterised by features • Face images • SIFT features • Object spectra • ... • If we measure n features → n-dimensional space • The arena for our problem is an n-dimensional vector space

  3. Background • Example: Eigenfaces • Raw pixel values: n by m gives nm features • Feature space is space of all n by m images

  4. Background • The space of all face-like images is smaller than the space of all images • Assumption is faces lie on a smaller manifold embedded in the global space All images Face images

  5. Manifold: A space which locally looks Euclidean Manifold learning: Finding the manifold representing the objects we are interested in All objects should be on the manifold, non-objects outside

  6. Part I: Euclidean Space Position, Similarity and Distance Manifold Learning in Euclidean space Some famous techniques Part II: Non-Euclidean Manifolds Assessing Data Nature and Properties of Manifolds Data Manifolds Learning some special types of manifolds Part III: Advanced Techniques Methods for intrinsically curved manifolds Thanks to Edwin Hancock, Eliza Xu, Bob Duin for contributions And support from the EU SIMBAD project

  7. Part I: Euclidean Space

  8. Position The main arena for pattern recognition and machine learning problems is vector space • A set of n well defined features collected into a vector ℝn Also defined are addition of vectors and multiplication by a scalar Feature vector → position

  9. Similarity To make meaningful progress, we need a notion of similarity Inner product • The inner-product ‹x,y› can be considered to be a similarity between x and y

  10. Induced norm • The self-similarity ‹x,x› is the (square of) the ‘size’ of x and gives rise to the induced norm, of the length of x: • Finally, the length of x allows the definition of a distance in our vector space as the length of the vector joining x and y • Inner product also gets us distance

  11. Euclidean space • If we have a vector space for features, and the usual inner product, all three are connected:

  12. non-Euclidean Inner Product • If the inner-product has the form • Then the vector space is Euclidean • Note we recover all the expected stuff for Euclidean space, i.e. • The inner-product doesn’t have to be like this; for example in Einstein’s special relativity, the inner-product of spacetime is

  13. The Golden Trio • In Euclidean space, the concepts of position, similarity and distance are elegantly connected Position X Similarity K Distance D

  14. Point position matrix • In a normal manifold learning problem, we have a set of samples X={x1,x2,...,xm} • These can be collected together in a matrix X I use this convention, but others may write them vertically

  15. Centreing A common and important operation is centreing – moving the mean to the origin • Centred points behave better is the mean matrix, so is the centred matrix • J is the all-ones matrix This can be done with C • C is the centreing matrix (and is symmetric C=CT)

  16. Position-Similarity Position X • The similarity matrix K is defined as • From the definition of X, we simply get • The Gram matrix is the similarity matrix of the centred points (from the definition of X) • i.e. a centring operation on K • Kc is really a kernel matrix for the points (linear kernel) Similarity K

  17. Position-Similarity Position X • To go from K to X, we need to consider the eigendecomposition of K • As long as we can take the square root of Λ then we can find X as Similarity K

  18. Kernel embedding First manifold learning method – kernel embedding Finds a Euclidean manifold from object similarities • Embeds a kernel matrix into a set of points in Euclidean space (the points are automatically centred) • K must have no negative eigenvalues, i.e. it is a kernel matrix (Mercer condition)

  19. Similarity-Distance Similarity K Distance D • We can easily determine Ds from K

  20. Similarity-Distance What about finding K from Ds ? Looking at the top equation, we might imagine that K=-½ Ds is a suitable choice • Not centred; the relationship is actually

  21. Classic MDS • Classic Multidimensional Scaling embeds a (squared) distance matrix into Euclidean space • Using what we have so far, the algorithm is simple • This is MDS Position X Distance D

  22. The Golden Trio Position X Kernel Embedding MDS Similarity K Distance D

  23. Kernel methods • A kernel is function k(i,j) which computes an inner-product • But without needing to know the actual points (the space is implicit) • Using a kernel function we can directly compute K without knowing X Position X Similarity K Kernel function Distance D

  24. Kernel methods • The implied space may be very high dimensional, but a true kernel will always produce a positive semidefinite K and the implied space will be Euclidean • Many (most?) PR algorithms can be kernelized • Made to use K rather than X or D • The trick is to note that any interesting vector should lie in the space spanned by the examples we are given • Hence it can be written as a linear combination • Look for α instead of u

  25. Kernel PCA • What about PCA? PCA solves the following problem • Let’s kernelize:

  26. Kernel PCA • K2 has the same eigenvectors as K, so the eigenvectors of PCA are the same as the eigenvectors of K • The eigenvalues of PCA are related to the eigenvectors of K by • Kernel PCA is a kernel embedding with an externally provided kernel matrix

  27. Kernel PCA • So kernel PCA gives the same solution as kernel embedding • The eigenvalues are modified a bit • They are essentially the same thing in Euclidean space • MDS uses the kernel and kernel embedding • MDS and PCA are essentially the same thing in Euclidean space • Kernel embedding, MDS and PCA all give the same answer for a set of points in Euclidean space

  28. Some useful observations • Your similarity matrix is Euclidean iff it has no negative eigenvalues (i.e. it is a kernel matrix and PSD) • By similar reasoning, your distance matrix is Euclidean iff the similarity matrix derived from it is PSD • If the feature space is small but the number of samples is large, then the covariance matrix is small and it is better to do normal PCA (on the covariance matrix) • If the feature space is large and the number of samples is small, then the kernel matrix will be small and it is better to do kernel embedding

  29. Part II: Non-Euclidean Manifolds

  30. Non-linear data • Much of the data in computer vision lies in a high-dimensional feature space but is constrained in some way • The space of all images of a face is a subspace of the space of all possible images • The subspace is highly non-linear but low dimensional (described by a few parameters)

  31. Non-linear data • This cannot be exploited by the linear subspace methods like PCA • These assume that the subspace is a Euclidean space as well • A classic example is the ‘swiss roll’ data:

  32. ‘Flat’ Manifolds • Fundamentally different types of data, for example: • The embedding of this data into the high-dimensional space is highly curved • This is called extrinsic curvature, the curvature of the manifold with respect to the embedding space • Now imagine that this manifold was a piece of paper; you could unroll the paper into a flat plane without distorting it • No intrinsic curvature, in fact it is homeomorphic to Euclidean space

  33. Curved manifold • This manifold is different: • It must be stretched to map it onto a plane • It has non-zero intrinsic curvature • A flatlander living on this manifold can tell that it is curved, for example by measuring the ratio of the radius to the circumference of a circle • In the first case, we might still hope to find Euclidean embedding • We can never find a distortion free Euclidean embedding of the second (in the sense that the distances will always have errors)

  34. Intrinsically Euclidean Manifolds • We cannot use the previous methods on the second type of manifold, but there is still hope for the first • The manifold is embedded in Euclidean space, but Euclidean distance is not the correct way to measure distance • The Euclidean distance ‘shortcuts’ the manifold • The geodesic distance calculates the shortest path along the manifold

  35. Geodesics • The geodesic generalizes the concept of distance to curved manifolds • The shortest path joining two points which lies completely within the manifold • If we can correctly compute the geodesic distances, and the manifold is intrinsically flat, we should get Euclidean distances which we can plug into our Euclidean geometry machine Position X Similarity K Distance D Geodesic Distances

  36. ISOMAP • ISOMAP is exactly such an algorithm • Approximate geodesic distances are computed for the points from a graph • Nearest neighbours graph • For neighbours, Euclidean distance≈geodesic distances • For non-neighbours, geodesic distance approximated by shortest distance in graph • Once we have distances D, can use MDS to find Euclidean embedding

  37. ISOMAP • ISOMAP: • Neighbourhood graph • Shortest path algorithm • MDS • ISOMAP is distance-preserving – embedded distances should be close to geodesic distances

  38. Laplacian Eigenmap • The Laplacian Eigenmap is another graph-based method of embedding non-linear manifolds into Euclidean space • As with ISOMAP, form a neighbourhood graph for the datapoints • Find the graph Laplacian as follows • The adjacency matrix A is • The ‘degree’ matrix D is the diagonal matrix • The normalized graph Laplacian is

  39. Laplacian Eigenmap • We find the Laplacian eigenmap embedding using the eigendecomposition of L • The embedded positions are • Similar to ISOMAP • Structure preserving not distance preserving

  40. Locally-Linear Embedding • Locally-linear Embedding is another classic method which also begins with a neighbourhood graph • We make point i (in the original data) from a weighted sum of the neighbouring points • Wij is 0 for any point j not in the neighbourhood (and for i=j) • We find the weights by minimising the reconstruction error • Subject to the constrains that the weights are non-negative and sum to 1 • Gives a relatively simple closed-form solution j i

  41. Locally-Linear Embedding • These weights encode how well a point j represents a point i and can be interpreted as the adjacency between i and j • A low dimensional embedding is found by then finding points to minimise the error • In other words, we find a low-dimensional embedding which preserves the adjacency relationships • The solution to this embedding problem turns out to be simply the eigenvectors of the matrix M • LLE is scale-free: the final points have the covariance matrix I • Unit scale

  42. Comparison • LLE might seem like quite a different process to the previous two, but actually very similar • We can interpret the process as producing a kernel matrix followed by scale-free kernel embedding

  43. Comparison • ISOMAP is the only method which directly computes and uses the geodesic distances • The other two depend indirectly on the distances through local structure • LLE is scale-free, so the original distance scale is lost, but the local structure is preserved • Computing the necessary local dimensionality to find the correct nearest neighbours is a problem for all such methods

  44. Non-Euclidean data • Data is Euclidean iff K is psd • Unless you are using a kernel function, this is often not true • Why does this happen?

  45. What type of data do I have? • Starting point: distance matrix • However we do not know apriori if our measurements are representable on a manifold • We will call them dissimilarities • Our starting point to answer the question “What type of data do I have?” will be a matrix of dissimilarities D between objects • Types of dissimilarities • Euclidean (no intrinsic curvature) • Non-Euclidean, metric (curved manifold) • Non-metric (no point-like manifold representation)

  46. Causes • Example: Chicken pieces data • Distance by alignment • Global alignment of everything could find Euclidean distances • Only local alignments are practical

  47. Causes Dissimilarities may also be non-metric The data is metric if it obeys the metric conditions • Dij≥ 0 (nonegativity) • Dij= 0 iff i=j (identity of indiscernables) • Dij= Dji (symmetry) • Dij≤Dik+ Dkj (triangle inequality) Reasonable dissimilarites should meet 1&2

  48. Causes • Symmetry Dij= Dji • May not be symmetric by definition • Alignment: i→j may find a better solution than j→i

  49. Causes • Triangle violations Dij≤Dik+ Dkj • ‘Extended objects’ • Finally, noise in the measure of D can cause all of these effects i j k

More Related