240 likes | 472 Views
Random Forest for Metric Learning with Pairwise Position Dependence. Caiming Xiong, David Johnson, Ran Xu , Jason J. Corso Department of Computer Science and Engineering SUNY at Buffalo { cxiong , davidjoh , rxu2, jcorso}@buffalo.edu Presented at ACM SIG KDD 2012 in Beijing, China.
E N D
Random Forest for Metric Learning with Pairwise Position Dependence Caiming Xiong, David Johnson, Ran Xu, Jason J. Corso Department of Computer Science and Engineering SUNY at Buffalo {cxiong, davidjoh, rxu2, jcorso}@buffalo.edu Presented at ACM SIG KDD 2012 in Beijing, China
Distance Function • Distance function is widely used in machine learning and data mining problems: • Classification: Gaussian Kernel in SVM. • Clustering: Spectral Clustering, Kmeans. • Is the predefined metric reliable, such as Euclidean distance? a c b Euc(a,b)==Euc(a,c), This is not what we want! Distance Metric Learning is needed.......
Outline • Introduction • Our Methods • Experiments • Conclusion
Distance Metric Learning • learn a single Mahalanobis metric, with representative methods. • RCA, DCA, LMNN, ITML, PSD Boost. Problem: a uniform Mahalanobis distance for ALL instances. • Learn multiplemetric, with representative methods. • FSM, FSSM (Frome et al. 2006,2007). • ISD (Zhan et al. 2009). • Bregman Distance (Wu et al. 2009). High Complexity for time and space High Complexity for testing
Our work Can we obtain a single distance function that is able to achieve both the efficiency of the global methods and specificity of the multi-metric methods? Yes, we learn Implicitly PairwisePosition Dependence Metric via Random Forest
Outline • Introduction • Our Methods • Experiments • Conclusion
Distance Metric Learning Revisit • Given the instance set , and two pairwise constraint set: must-link set S, and cannot-link D. Distance function space, PSD matrix space for Mahalanobis Distance Function Then redefine , where is classification model, is a feature mapping function for the pair. The problem becomes classification problem with function space constraint. Must-Link {S(A,B)} Cannot-Link {D(A,B)}
Example by Single Mahalanobis Metric Learning Standard Mahalanobis-based methods learn a distance function of the form . The mapping function is defined as: (1) Using hinge loss: Therefore, metric learning problem is transferred to classification problem with PSD matrix constraint.
Mapping function for position dependent metric learning Now we define a simple and more general mapping function: relative location/difference the mean of the two point vectors It allows our method to adapt to heterogeneous distributions of data. But we proved that this mapping function cannot be used into the single Mahalanobis Metric Learning methods!
Tree-Structured Metric with Implicit Position Dependence • Instead Mahalanobis metric, we design the tree structured metric: This Tree Metric can be learned by Decision Tree. Neg: learning a single tree via greedy method is easy to get trapped into overfitting problem.
Random Forest for Metric Learning • To make the algorithm more general, we adopt the random trees technique to learn our tree metric: Using Breiman’s Random Forest method, We name our distance as Random Forest Distance (RFD)!
RFD comments • pseudosemimetric • Since the nonlinear property of tree structure and new mapping function, our distance function does not satisfy the triangle inequality. • Computational complexity • Build trees in parallel, it is O(n log n). • learning processes of nodes from the same depth also could be paralleled.
Outline • Introduction • Our Methods • Experiments • Conclusion
Datasets and Methods • Datasets • 10 UCI datasets. • Corel image database. • Methods • Ours RFD. • Euclidean, Mahalanobis (inverse of covariance matrix). • RCA, DCA, ITML, LMNN. • FSM, FSSM, ISD. Single Metric learning methods Multiple Metric learning methods
Classification Performance • Comparison with global Mahalanobis metric learning methods Homogeneous, low dimensional. Heterogeneous, high dimensional.
Classification Performance • Comparison with position-specific multi- metric methods:
Outline • Introduction • Our Methods • Experiments • Conclusion
Conclusion • Main contribution: • Overcoming the limitation of single global metric, we incorporates conventional relative position information as well as absolute position of point pairs into the learned metric, and hence implicitly adapts the position-based metric through the feature space. • Our code is publicly released (http://www.cse.buffalo.edu/~cxiong/). Acknowledgements: This work was partially supported by the National Science Foundation CAREER grant(IIS-0845282), the Army Research Office (W911NF-11-1-0090), DARPA CSSG (HR0011-09-1-0022 and D11AP00245). Thank you!
Multiple Metric Learning [Frome et al. NIPS 06, ICCV 07]: estimate a distance metric per instance or exemplar. [Zhan et al. ICML 09]: propagating metrics learned on training exemplars to learn a metric matrix for each unlabeled point Neg: high time and space complexity due to the need to learn and store O(N) d by d metric matrices. [Wu et al. NIPS 09]: learns a Bregman distance function for semi-supervised clustering, but it needs to take O(N) time to calculate the distance of single pair of points which is impractical for large scale data sets.
Retrieval Performance • We do not include LMNN for the retrieval problem because it requires a label for each training point, which we do not have.
Why not Mahalanobis? For any two points and such that , where is a scalar, Thus,as , This is a nonsensical result, and clearly undesirable in a metric. Thus, as , This is a nonsensical result, and clearly undesirable in a metric. Thus, as , This is a nonsensical result, and clearly undesirable in a metric. Why not Mahalanobis? For any two points and such that , where is a scalar, Why not Mahalanobis? For any two points and such that , where is a scalar, Why not Mahalanobis? For any two points and such that , where is a scalar, Why not Mahalanobis? For any two points and such that , where is a scalar,