170 likes | 317 Views
Advanced Machine Learning & Perception. Instructor: Tony Jebara. Topic 12. Manifold Learning (Unsupervised) Beyond Principal Components Analysis (PCA) Multidimensional Scaling (MDS) Generative Topographic Map (GTM) Locally Linear Embedding (LLE) Convex Invariance Learning (CoIL)
E N D
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara
Tony Jebara, Columbia University Topic 12 • Manifold Learning (Unsupervised) • Beyond Principal Components Analysis (PCA) • Multidimensional Scaling (MDS) • Generative Topographic Map (GTM) • Locally Linear Embedding (LLE) • Convex Invariance Learning (CoIL) • Kernel PCA (KPCA)
Tony Jebara, Columbia University Manifolds • Data is often embedded in a lower dimensional space • Consider image of face being translated from left-to-right • How to capture the true coordinates of the data on the • manifold or embedding space and represent it compactly? • Open problem: many possible approaches… • PCA: linear manifold • MDS: get inter-point distances, find 2D data with same • LLE: mimic neighborhoods using low dimensional vectors • GTM: fit a grid of Gaussians to data via nonlinear warp • Linear after Nonlinear normalization/invariance of data • Linear in Hilbert space (Kernels)
Tony Jebara, Columbia University Principal Components Analysis • If we have eigenvectors, mean and coefficients: • Getting eigenvectors (I.e. approximating the covariance): • Eigenvectors are orthonormal: • In coordinates of v, Gaussian is diagonal, cov = L • All eigenvalues are non-negative • Higher eigenvalues are higher variance, use those first • To compute the coefficients:
Tony Jebara, Columbia University Multidimensional Scaling (MDS) • Idea: capture only distances between points X • in original space • Construct another set of low dim or 2D Y points having • same distances • A Dissimilarity d(x,y) is a function of two objects x and y • such that • A Metric also has to satisfy triangle inequality: • Standard example: Euclidean l2 metric • Assume for N objects, we compute a dissimilarity D • matrix which tells us how far they are
Tony Jebara, Columbia University Multidimensional Scaling • Given dissimilarity D between original X points under • original d() metric, find Y points with dissimilarity D under • another d’() metric such that D is similar to D • Want to find Y’s that minimize some difference from D to D • Eg. Least Squares Stress = • Eg. Invariant Stress = • Eg. Sammon Mapping = • Eg. Strain = Some are global Some are local Gradient descent
Tony Jebara, Columbia University MDS Example 3D to 2D • Have distances from • cities to cities, these • are on the surface of • a sphere (Earth) in • 3D space • Reconstructed 2D • points on plane • capture essential • properties (poles?)
Tony Jebara, Columbia University MDS Example Multi-D to 2D • More • elaborate • example • Have • correlation • matrix between • crimes. These • are arbitrary • dimensionality. • Hack: convert • correlation • to dissimilarity • and show • reconstructed Y
Tony Jebara, Columbia University Locally Linear Embedding • Instead of distance, look at neighborhood of each point. • Preserve reconstruction of point with neighbors in low dim • Find K nearest neighbors • for each point • Describe neighborhood as • best weights on neighbors • to reconstruct the point • Find best vectors that still • have same weights Why?
Tony Jebara, Columbia University Locally Linear Embedding • Finding W’s (convex combination of weights on neighbors): 3) Find l 4) Find w 1) Take Deriv & Set to 0 2) Solve Linear system
Tony Jebara, Columbia University Locally Linear Embedding • Finding Y’s (new low-D points that agree with the W’s) • Solve for Y as • the bottom d+1 • eigenvectors of M • Plot the Y values
Tony Jebara, Columbia University LLE Examples • Original X data are raw • images • Dots are reconstructed • two-dimensional Y • points
Tony Jebara, Columbia University LLEs • Top=PCA • Bottom=LLE
Tony Jebara, Columbia University Generative Topographic Map • A principled altenative to the Kohonen map • Forms a generative • model of the • manifold. Can • sample it, etc. • Find a nonlinear • mapping y() from • a 2D grid of Gaussians. • Pick params W of mapping such that mapped Gaussians in • data space maximize the likelihood of the observed data. • Have two spaces, the data space t (old notation were X’s) • and the hidden latent space x (old notation were Y’s). • The mapping goes from latent space to observed space
Tony Jebara, Columbia University GTM as a Grid of Gaussians • We choose our priors and • conditionals for all • variables of • interest • Assume Gaussian • noise on the • y() mapping • Assume our prior latent variables are a grid model • equally spaced in latent space • Can now write out the full likelihood
Tony Jebara, Columbia University GTM Distribution Model • Integrating over delta functions makes a summation • Note the log-sum, need to apply EM to maximize • Also, use the following parametric • (linear in the basis) form of the mapping • Examples of • manifolds for • randomly chosen • W mappings • Typically, we are • given the data and • want to find the maximum likelihood mapping W for it…
Tony Jebara, Columbia University GTM Examples • Recover non-linear • manifold by warping • grid with W params • Synthetic Example: • Left = Initialized • Right = Converged • Real Example: • Oil Data • 3-Classes • Left = GTM • Right = PCA