450 likes | 1.28k Views
Topology in Manifold Learning. Jonathan Huang Presented at misc-read, 11.22.06. Bibliography. *Simultaneous Inference of View and Body Pose Using Torus Manifolds* Chan-Su Lee and Ahmed Elgammal The 18 th International Conference on Pattern Recognition (ICPR), Hong Kong, August 21-24, 2006
E N D
Topology in Manifold Learning Jonathan Huang Presented at misc-read, 11.22.06
Bibliography • *Simultaneous Inference of View and Body Pose Using Torus Manifolds*Chan-Su Lee and Ahmed Elgammal The 18th International Conference on Pattern Recognition (ICPR), Hong Kong, August 21-24, 2006 • *Finding the Homology of Submanifolds with High Confidence from Random Samples.*P. Niyogi, S. Smale, and S. Weinberger to appear, Discrete and Computational Geometry, 2006. • *On the Local Behavior of Spaces of Natural Images*G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian, /preprint/, May 31, 2006. • *Computing Persistent Homology*A. Zomorodian and G. Carlsson, Discrete and Computational Geometry, *33* (2), pp. 247-274, 2005.
Outline • The Role of Topology in Manifold Learning • Constrained Topology Manifold Learning • Topology Basics • Learning a Topology from Noisy Data • Statistical Approach • Multi-scale Approach
The ISOMAP Algorithm • The ISOMAP algorithm: • Compute pairwise distances for some point cloud • Choose an embedding dimension, d (d=2 in this example) • Run the following code in matlab: >> options.dims = 2; >> [Y, R, E] = Isomap(DistanceMatrix, options); ISOMAP
Selecting the Dimensionality • Problem: How do we choose embedding dimension? • Several solutions (maybe some non-NIPS solutions too): • Brand (NIPS 2003) • Kegl (NIPS 2003) • Levina and Bickel (NIPS 2005) • Raginsky and Lazebnik (NIPS 2006) Dimensionality Estimation d=2
Why none of these are actually solutions Dimensionality Estimation d=2 ISOMAP (samples from a sphere)
Moral • Manifold Learning is hard if you don’t take topology into account!
Outline • The Role of Topology in Manifold Learning • Constrained Topology Manifold Learning • Topology Basics • Learning a Topology from Noisy Data • Statistical Approaches • Multi-scale Approaches
Manifold Learning with Known Topology • Images of a periodic gait from varying viewpoints • What is the intrinsic topology of this set of images? Body Pose View angle (0°-330°)
Learning a Mapping from a Torus • The product of two periodic spaces is a torus! • Use kernel methods to learn a map to/from a Torus
Outline • The Role of Topology in Manifold Learning • Constrained Topology Manifold Learning • Topology Basics • Learning a Topology from Noisy Data • Statistical Approaches • Multiresolution Approaches
What is Topology? • Popular answer: It’s the branch of math that can’t tell the difference between a coffee cup and a donut
What is Topology? • Topology cares about how a space is connected • It does not care about distances • We will define a very general class of topological spaces – the simplicial complexes
Simplex • An n-simplex is the convex hull of (n+1) (independent) points 0-simplex 1-simplex 2-simplex
Simplicial Complex • A simplicial complex is a finite collection of simplices S such that • Any face of a simplex in S is also in S • The intersection of two simplices in S is either empty or a face for both simplices • The dimension of S is the maximum dimension over all simplices in S
Boundary Maps • Graphs are one-dimensional simplicial complexes • Define a boundary matrix M1 with rows corresponding to vertices and columns corresponding to edges • M1(v,e) = 0 if vertex v is not part of edge e • M1(v,e) = -1 if vertex v is the first vertex of edge e • M1(v,e) = 1 if vertex v is the second vertex of edge e a ab ac b bc c
Boundary Maps • Example: • The boundary of an edge is its two vertices
Boundary Maps • Example: • The boundary of the loop is empty:
Boundary Maps • In general, the dim(Nullspace(M)) is the number of “different” loops in the graph • And (#vertices)-Rank(M) is the number of connected components
Betti Numbers • For a simplicial complex S, we can define a boundary matrix Mk at each dimension k of S • Define the kth betti number to be • k = Dim(Nullspace(Mk-1))-Dim(Complement of column space(Mk))
Betti Numbers • For an object in 3d space • 0 is the number of connected components • 1 is the number of tunnels or handles • 2 is the number of ‘voids’ Point Circle Torus
Outline • The Role of Topology in Manifold Learning • Constrained Topology Manifold Learning • Topology Basics • Learning a Topology from Noisy Data • Statistical Approaches • Multi-scale Approaches
Problem Formulation • Given x1,x2,…,xn, i.i.d samples from a manifold. What are the betti numbers of the manifold?
An “Well Known” Algorithm • Two Steps: • First put an -ball around each point • Compute the betti numbers of the union of these balls using the Nerve Complex
The Nerve Complex • Given a collection of balls, U1,U2,… in Euclidean space, what is the topology of their union? • Construct the Nerve Complex: • For each ball, add a vertex • Add a k-simplex whenever k+1 balls have nonempty intersection
Nerve Complex • The Nerve Lemma states that the original space and the Nerve Complex have the same Betti numbers! • Example A[ B A B A B A[ C B[ C C C
A PAC-bound • With high probability and enough samples, we can recover the true Betti numbers! • The number of samples depends on the volume of the manifold and its condition number (how close it gets to itself) • (Niyogi, Smale, Weinberger, 2006) • (but how does one choose ?)
Outline • The Role of Topology in Manifold Learning • Constrained Topology Manifold Learning • Topology Basics • Learning a Topology from Noisy Data • Statistical Approaches • Multi-scale Approaches
Another Example Spiral… or Torus???
Topological Persistence • Topological Persistence looks at topology from all scales at once • Surprisingly, it’s not much harder to compute the betti numbers at every scale!
Filtered Complex • A Filtered Complex is an increasing sequence of simplicial complexes (Ct Ct+1) • t is called the filtration index t=0 t=1 t=3 t=2
Barcodes • Barcodes represent betti numbers as a function of filtration index • Intuitively, Barcodes measure the “lifetime” of topological features • The Persistence Algorithm provides a way to compute barcodes efficiently
Persistence Algorithm • Empirically, the algorithm works in linear time • Worst case complexity bound ~O(m3) • Where m is the # of simplices • Pros: • Persistence distinguishes local features from global features • Applies to learning manifold topology from noisy data • Cons: • No real probabilistic semantics
Conclusion • Learning a topology is not hopeless. So… • The next time you decide to learn a manifold, take a moment to contemplate the underlying topology!