Manifold Learning Via Homology

Manifold Learning Via Homology Presenter: Ronen Talmon Topological Methods in Electrical Engineering and Networks January 19, 2011

Sources P. Niyogi, S. Smale, and S. Weinberger, “Finding the homology of submanifolds with high confidence from random samples”, Combinatorial and Discrete Geometry, 2008. P. Niyogi, S. Smale, and S. Weinberger, “A topological view of unsupervised learning from noisy data”, to appear, 2011. Manifold Learning via Homology

Contents Introduction & Preliminaries Main Results Practice Conclusion

Modern Data Analysis Data points in high-dimensional space Examples from image and audio processing, finance, neuroscience The data cannot fill up the high-dimensional space uniformly Usually, the space dimensionality is arbitrary chosen by the user or the acquisition system The data lie on a low dimensional structure, conveying its intrinsicdegrees of freedom Manifold Learning via Homology

Manifold Learning Principal Component Analysis (PCA) Linearly project the data into a lower dimensional subspace Along the directions of maximal variations Manifold Learning via Homology

Manifold Learning Common manifold learning techniques ISOMAP [Tenenbaum, 00’], LLE [Roweis & Saul, 00’]LaplacianEignemaps[Belkin & Niyogi, 01’]Hessian Eigenmaps[Donoho & Grimes, 02’] Define pair-wise affinity metric (kernel) Spectral characterization of the manifold via Spectral Graph Theory Manifold Learning via Homology

Manifold Learning via Homology Goal: Identify the homology of the submanifold from random samples Natural topological invariants the provide good characterization: The dimension of the 0th homology group is the # of connected components(clustering tasks) The largest non-trivial homology gives the dimension of the submanifold While graph-based techniquescharacterize the samples (graph),this method characterizes themanifold via the samples. Manifold Learning via Homology

Condition Number Definition (Normal Bundle) [Wikipedia] Manifold Learning via Homology

Condition Number Theorem (Tubular Neighborhood) Manifold Learning via Homology

Condition Number Definition (Condition Number) Meaning: The largest s.t. the tube of radius around has no intersections. Manifold Learning via Homology

Computing the Homology Acquire uniform i.i.d. samples of the manifold Construct Exploit Čech Complex to compute the homology of Example: Manifold Learning via Homology

Computing the Homology Definition (Čech Complex) Computing the simplicial complex: for any set of points,determine whether balls of radius around these points intersect. Given a set of points, find the ball with the smallest radius enclosingall these points iff this smallest radius Manifold Learning via Homology

The Homology Theorem (main result) Note: in practice, the manifold is unknown, and hence, its condition number Manifold Learning via Homology

Dense Sampling Definition Manifold Learning via Homology

Retract Definition (retract) Manifold Learning via Homology

Deformation Retract Definition (Deformation Retract) Meaning: a deformation retract is a homotopybetween a retraction and the identity map on A deformation retract is a special case ofhomotopy equivalence It implies that and have the same homology groups. [Wikipedia] Manifold Learning via Homology

Deterministic Setting Proposition Intuition: For “complexed” manifolds(small ): Requires dense sampling is “tight” to Manifold Learning via Homology

The Deformation Retract Proposition Intuition: For “complexed” manifolds(small ): Requires dense sampling is “tight” to Manifold Learning via Homology

Probability Bounds Proposition where Manifold Learning via Homology

Practical Extensions In practice: The samples are not uniformly distributed on the manifold Noisy data – the samples concentrate around the manifold,but do not lie exactly on it “Clean” noisy samples and simplify the construction of the complex The Combinatorial Laplacian Manifold Learning via Homology

Probability Model Probability Model Key question: whether the homology of can be inferred from examplesdrawn according to Investigated under the strong variance condition: Manifold Learning via Homology

Mixture of Gaussians Consider the widely-used probability distribution: with Relate to the setting: A manifold consisting of points The probability distribution on the manifold The probability distribution of the noise is a singleGaussian with mean 0 and variance Manifold Learning via Homology

Mixture of Gaussians Given a collection of points sampled from a mixture of Gaussians Learning the homology of the underlying manifold yields: The # of connected components (0thBetti number) equals the #of Gauassians (and the # of clusters) Through higher Betti numbers, whether the connected components retractto a point Manifold Learning via Homology

The Algorithm Cleaning procedure: The # of samples is chosen to guarantee that the manifold is well coveredby balls around these points One randomly tends to oversample certain regions on the way to coverage Therefore the “extra” sample points may be disregarded Disregard the more noisy samples Choosing a minimal covering set from the data makes the associatedsimplicial complex simpler, and the boundary maps in the chaincomplex sparse Manifold Learning via Homology

The Algorithm Manifold Learning via Homology

The Algorithm - details Choice of parameters: The radius is : The threshold is: The nerve scale: Manifold Learning via Homology

Main Theorem Theorem Notes: The probability distribution is supported on all of However, it concentrates around a low-dimensional structure(for sufficiently small noise variance ) Manifold Learning via Homology

Example Mixture of Gaussians: The manifold is a set of points The manifold condition number is given by The homology can be computed (e.g., the task of clustering), when the variance of the Gaussians is small relative to the distance between their means. Manifold Learning via Homology

The Combinatorial Laplacian Recall: Manifold Learning via Homology

The Combinatorial Laplacian Definition (Combinatorial Laplacian) Remark: corresponds to the standard graph-Laplacian is the set of functions on the vertex set of the complex is the set of edges of the complex Manifold Learning via Homology

Example 4 4 2 1 3 1 2 3 The boundary operator : 2 1 1 Manifold Learning via Homology

Example 4 4 2 1 3 1 2 3 The boundary operator : Manifold Learning via Homology

Example 4 2 1 3 The graph-Laplacian: Manifold Learning via Homology

The Combinatorial Laplacian Claim [Friedman, 1998] Remark: The dimensionality of the null-space of the graph-Laplaciangives the # of connected components of the graph Related to the # of connected components of the manifold In practice, the # of connected components is interpreted asthe # of clusters (in classical spectral clustering) Manifold Learning via Homology

Conclusion Computing the homology of a manifold from samples Sufficient sampling density w.r.t. the manifold “complexity” Amount of uniform random samples to obtain sufficient density A more practical scenario Presence of noise Simplified algorithm adjusted to both the “complexity” ofthe manifold and the noise variance The construction of the combinatorial Laplacian based on the homology,and the connection to common manifold learning techniques Manifold Learning via Homology

Thank you Manifold Learning via Homology

Manifold Learning Via Homology

Manifold Learning Via Homology

Presentation Transcript

Homology Modeling via Protein Threading

Homology

Manifold learning: Laplacian Eigenmaps

homology

Dictionary Learning on a Manifold

Manifold learning: MDS and Isomap

Manifold learning

Manifold learning

Manifold learning: Locally Linear Embedding

Similarities, Distances and Manifold Learning

An Ion Channel Database – Annotation via Homology Modelling

Topology in Manifold Learning

Homology

Manifold learning: MDS and Isomap

Homology

Manifold learning: Locally Linear Embedding

Manifold learning: Locally Linear Embedding

Manifold Learning

Homology

Manifold learning

Manifold Learning