A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction

A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction Fuxin Li1, Jian Yang2 and Jue Wang1 1 Insitute of Automation, Chinese Academy of Sciences 2 Beijing University of Technology

Metric Learning: What does it do? • Given an apriori metric, metric learning tries to adapt it to the information coming from training items. • For example, items from the same class can be pushed together, while items from different classes can be pull apart.

What’s good? • Many pattern recognition and machine learning methods depend on a good choice of metric, which may not be available every time. • As a compromise, Euclidean metric is often used as the default one. Questionable! • Metric learning can be used to make things better.

Endless Learning Cycle • Moreover, the learned metric can be used again as an apriori metric in the next time.

How to learn? • The basic idea in many metric learning algorithms is to balance the information from the training data and the apriori metric. • Meanwhile, many methods seek to learn a Mahalanobis metric in the original space or in the feature space induced by kernels. • (Xing, Ng., Jordan, Russell 2003) • (Kwok and Tsang 2003) • (Weinberger, Blitzer and Saul 2006)

Wait a minute… • We can get a Mahalanobis metric by linearly transforming the input data. • Basically, when we say we learn a Mahalanobis metric, we are looking for “good” directions in the input space and projecting on them. • This reminds us of another field of research: dimensionality reduction.

Dimensionality Reduction • Linear or nonlinear. • Try to find a low-dimensional representation of the original data. • Nonlinear dimensionality reduction quite popular in recent years. • Mainly unsupervised.

And Metric Learning? • Dimensionality reduction learns metrics. • Metric learning = a supervised or semi-supervised version of dimensionality reduction? • Under certain assumptions and formulations, yes. • A combined view of these two topics can give something interesting.

A Metric Learning Formulation • cijk is positive if dijshould be larger than dik, and xj, xk are neighbors of xi. • Intuitively, minimizing the first part of the criterion moves labeled items with the same label together and items with different label apart. • The second part ensures the apriori structure is preserved.

Graph Transduction • A possible way to specify the second term is to link the items as a graph and use the Laplacian as the penalty P. • Intuitively, when dragging an item, the items linked to it will follow.

The Euclidean Assumption • Further assume that the metric is Euclidean. • Means that there exists a Euclidean space that we can place all the input items in there as points, and our metric is just the Euclidean metric in the space. • This assumption was first used in metric learning in (Zhang 2003) but they did not make a connection with spectral methods.

And Kernels • To be Euclidean implies that there exists an inner product in the space. • The distance is completely decided by the inner product, so it suffices to learn the inner product only. • For a finite sample, this means that it suffices to learn the Gram matrix. Thus the metric learning problem in this case is the same with the kernel learning problem.

Learning a Kernel • The following semi-supervised optimization problem learns a kernel matrix G of order n+m: • Ci and P are penalty matrices from the training data and the apriori structure, respectively. With suitable choices of these matrices, this problem is the same with the previous one. with G = XXT. The rows of X give the coordinates of each training item in a low- dimensional Euclidean space.

Dimensionality Reduction Kernel PCA/Kernel MDS/Laplacian Eigenmaps/Other Spectral Methods (Semi-)Supervision + Ci P (Bengio et al, 2004) Metric Learning Under the Euclidean assumption = With the particular loss function, it retained as an eigenvalue problem in the semi-supervised case. If we use a sparse graph Laplacian penalty, the algorithm has a time complexity of O((n+m)2).

More to give: RKHS regularization • In the one-dimensional case, it is proved to be equivalent with an RKHS regularization problem: • Interesting because different from common regularization problems, it penalizes pairwise differences, a function of ( f (xi), f (xj)) instead of ( f (xi), yi). • Also gives a natural out-of-sample extension.

Moving yto the weights • The information in y moved into the weights. • Set wij = t(d(xi, xj), d(yi,yj)), possible to solve problems with output y in any metric space. • Example: multi-class classification:

The parameter λ • λ controls the strength of our prior belief. • Intuitively, a “learned” guy should have stronger prior beliefs not easily changed by what he sees. While an newbie might have weaker beliefs that changes rapidly from observations. • However, in semi-supervised settings it is difficult to decide a good value of λ. In our experiments it is decided by grid search.

Experiments: Two Moons

Experiments: UCI Data

Experiments: MNIST • In the MNIST experiments we experimented with Tangent Distance (Simard et al. 1998) to see if a better apriori knowledge can result in better results. • The result shows that Tangent Distance is much better than Euclidean Distance, while our algorithm has only marginal improvements over Laplacian Eigenmaps in this case.

Conclusion • The work is preliminary but a framework for distance metric learning is useful. • Under the Euclidean assumption, the distance metric learning problem can be done by adding label information to spectral dimensionality reduction methods. • It also gives a regularization problem that is different from others and quite interesting.

Ongoing Work • We are currently experimenting with different loss functions other than the current one. • Some particularly interesting loss functions include hinge loss and exponential loss. • However optimizing on these loss functions require semi-definite programming or convex optimization, and the scalability is not as good as the current algorithm. We are working to find fast solvers of the optimization problems.

Beyond Euclidean • What will happen when the Euclidean assumption is not satisfied? • A particularly interesting scenario is that the Euclidean assumption only satisfies locally. • Then the dataset is locally homeomorphic to a subset on a Euclidean space, which means it lies on a topological manifold. • Different from “locally linear” techniques in current manifold learning, it is possible to design “locally Euclidean” techniques grounding from the current framework.

Thanks!

A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction

A Transductive Framework of Distance Metric Learning by Spectral Dimensionality Reduction

Presentation Transcript

Dimensionality reduction

Dimensionality Reduction

Dimensionality reduction

Dimensionality reduction

Dimensionality reduction

Dimensionality reduction

Dimensionality Reduction

Dimensionality Reduction

Dimensionality Reduction

Dimensionality reduction

Machine Learning Dimensionality Reduction

Dimensionality reduction

Dimensionality Reduction

Distance Metric Learning: A Comprehensive Survey

Dimensionality reduction

Dimensionality Reduction and Metric Learning STAT 946

Dimensionality reduction