380 likes | 579 Views
Manifold Learning Via Homology. Presenter: Ronen Talmon Topological Methods in Electrical Engineering and Networks January 19, 2011. Sources.
E N D
Manifold Learning Via Homology Presenter: Ronen Talmon Topological Methods in Electrical Engineering and Networks January 19, 2011
Sources P. Niyogi, S. Smale, and S. Weinberger, “Finding the homology of submanifolds with high confidence from random samples”, Combinatorial and Discrete Geometry, 2008. P. Niyogi, S. Smale, and S. Weinberger, “A topological view of unsupervised learning from noisy data”, to appear, 2011. Manifold Learning via Homology
Contents Introduction & Preliminaries Main Results Practice Conclusion
Modern Data Analysis Data points in high-dimensional space Examples from image and audio processing, finance, neuroscience The data cannot fill up the high-dimensional space uniformly Usually, the space dimensionality is arbitrary chosen by the user or the acquisition system The data lie on a low dimensional structure, conveying its intrinsicdegrees of freedom Manifold Learning via Homology
Manifold Learning Principal Component Analysis (PCA) Linearly project the data into a lower dimensional subspace Along the directions of maximal variations Manifold Learning via Homology
Manifold Learning Common manifold learning techniques ISOMAP [Tenenbaum, 00’], LLE [Roweis & Saul, 00’]LaplacianEignemaps[Belkin & Niyogi, 01’]Hessian Eigenmaps[Donoho & Grimes, 02’] Define pair-wise affinity metric (kernel) Spectral characterization of the manifold via Spectral Graph Theory Manifold Learning via Homology
Manifold Learning via Homology Goal: Identify the homology of the submanifold from random samples Natural topological invariants the provide good characterization: The dimension of the 0th homology group is the # of connected components(clustering tasks) The largest non-trivial homology gives the dimension of the submanifold While graph-based techniquescharacterize the samples (graph),this method characterizes themanifold via the samples. Manifold Learning via Homology
Condition Number Definition (Normal Bundle) [Wikipedia] Manifold Learning via Homology
Condition Number Theorem (Tubular Neighborhood) Manifold Learning via Homology
Condition Number Definition (Condition Number) Meaning: The largest s.t. the tube of radius around has no intersections. Manifold Learning via Homology
Computing the Homology Acquire uniform i.i.d. samples of the manifold Construct Exploit Čech Complex to compute the homology of Example: Manifold Learning via Homology
Computing the Homology Definition (Čech Complex) Computing the simplicial complex: for any set of points,determine whether balls of radius around these points intersect. Given a set of points, find the ball with the smallest radius enclosingall these points iff this smallest radius Manifold Learning via Homology
The Homology Theorem (main result) Note: in practice, the manifold is unknown, and hence, its condition number Manifold Learning via Homology
Dense Sampling Definition Manifold Learning via Homology
Retract Definition (retract) Manifold Learning via Homology
Deformation Retract Definition (Deformation Retract) Meaning: a deformation retract is a homotopybetween a retraction and the identity map on A deformation retract is a special case ofhomotopy equivalence It implies that and have the same homology groups. [Wikipedia] Manifold Learning via Homology
Deterministic Setting Proposition Intuition: For “complexed” manifolds(small ): Requires dense sampling is “tight” to Manifold Learning via Homology
The Deformation Retract Proposition Intuition: For “complexed” manifolds(small ): Requires dense sampling is “tight” to Manifold Learning via Homology
Probability Bounds Proposition where Manifold Learning via Homology
Practical Extensions In practice: The samples are not uniformly distributed on the manifold Noisy data – the samples concentrate around the manifold,but do not lie exactly on it “Clean” noisy samples and simplify the construction of the complex The Combinatorial Laplacian Manifold Learning via Homology
Probability Model Probability Model Key question: whether the homology of can be inferred from examplesdrawn according to Investigated under the strong variance condition: Manifold Learning via Homology
Mixture of Gaussians Consider the widely-used probability distribution: with Relate to the setting: A manifold consisting of points The probability distribution on the manifold The probability distribution of the noise is a singleGaussian with mean 0 and variance Manifold Learning via Homology
Mixture of Gaussians Given a collection of points sampled from a mixture of Gaussians Learning the homology of the underlying manifold yields: The # of connected components (0thBetti number) equals the #of Gauassians (and the # of clusters) Through higher Betti numbers, whether the connected components retractto a point Manifold Learning via Homology
The Algorithm Cleaning procedure: The # of samples is chosen to guarantee that the manifold is well coveredby balls around these points One randomly tends to oversample certain regions on the way to coverage Therefore the “extra” sample points may be disregarded Disregard the more noisy samples Choosing a minimal covering set from the data makes the associatedsimplicial complex simpler, and the boundary maps in the chaincomplex sparse Manifold Learning via Homology
The Algorithm Manifold Learning via Homology
The Algorithm - details Choice of parameters: The radius is : The threshold is: The nerve scale: Manifold Learning via Homology
Main Theorem Theorem Notes: The probability distribution is supported on all of However, it concentrates around a low-dimensional structure(for sufficiently small noise variance ) Manifold Learning via Homology
Example Mixture of Gaussians: The manifold is a set of points The manifold condition number is given by The homology can be computed (e.g., the task of clustering), when the variance of the Gaussians is small relative to the distance between their means. Manifold Learning via Homology
The Combinatorial Laplacian Recall: Manifold Learning via Homology
The Combinatorial Laplacian Definition (Combinatorial Laplacian) Remark: corresponds to the standard graph-Laplacian is the set of functions on the vertex set of the complex is the set of edges of the complex Manifold Learning via Homology
Example 4 4 2 1 3 1 2 3 The boundary operator : 2 1 1 Manifold Learning via Homology
Example 4 4 2 1 3 1 2 3 The boundary operator : Manifold Learning via Homology
Example 4 2 1 3 The graph-Laplacian: Manifold Learning via Homology
The Combinatorial Laplacian Claim [Friedman, 1998] Remark: The dimensionality of the null-space of the graph-Laplaciangives the # of connected components of the graph Related to the # of connected components of the manifold In practice, the # of connected components is interpreted asthe # of clusters (in classical spectral clustering) Manifold Learning via Homology
Conclusion Computing the homology of a manifold from samples Sufficient sampling density w.r.t. the manifold “complexity” Amount of uniform random samples to obtain sufficient density A more practical scenario Presence of noise Simplified algorithm adjusted to both the “complexity” ofthe manifold and the noise variance The construction of the combinatorial Laplacian based on the homology,and the connection to common manifold learning techniques Manifold Learning via Homology
Thank you Manifold Learning via Homology