1 / 23

Evaluation of Distance Metrics for Recognition Based on  Non-Negative Matrix Factorization

Evaluation of Distance Metrics for Recognition Based on  Non-Negative Matrix Factorization. David Guillamet, Jordi Vitrià Pattern Recognition Letters 24:1599-1605, June, 2003 John Galeotti Advanced Perception March 23, 2004. Actually, Two ICPR’02 Papers.

kairos
Download Presentation

Evaluation of Distance Metrics for Recognition Based on  Non-Negative Matrix Factorization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of Distance Metrics for Recognition Based on  Non-Negative Matrix Factorization David Guillamet, Jordi Vitrià Pattern Recognition Letters 24:1599-1605, June, 2003 John Galeotti Advanced Perception March 23, 2004

  2. Actually, Two ICPR’02 Papers Analyzing Non-Negative Matrix Factorization for Image Classification David Guillamet, Bernt Schiele, Jordi Vitrià Determining a Suitable Metric When using Non-negative Matrix Factorization David Guillamet, Jordi Vitrià

  3. Non-Negative Matrix Factorization • TLA: NMF • Used for dimensionality reduction • Vnxm ≈ WnxrHrxm, r < nm/(n+m) • V has non-negative training samples as its columns • W contains the non-negative basis vectors • H contains the non-negative coefficients to approximate each column of V using W • Results similar in concept to PCA, but with non-negative “basis vectors”

  4. NMF Distinguishing Properties • Requires positive data • Computationally expensive • Part-based decomposition • Because only additive combinations of original data are allowed • Not an orthonormal basis

  5. Different Decomposition Types 20 Dimensions of Numeric Digits PCA NMF 50 Dimensions of Numeric Digits PCA NMF

  6. Why not just use PCA? • PCA is optimal for reconstruction • PCA is not optimal for separation and recognition of classes

  7. NMF Issues Addressed • If/when is NMF better at dimensionality reduction than PCA for classification? • Can combining PCA and NMF lead to better performance? • What is the best distance metric to use with the nonorthonormal basis of NMF?

  8. How NMF Works • Vnxm ≈ WnxrHrxm, r < nm/(n+m) • Begin with a nxm matrix of training data V • Each column is a vectorized data point • Randomly initialize W and H with positive values • Iterate according to update rules:

  9. How NMF Works • In general, NMF requires the non-linear optimization of an objective function • The update rules just given correspond to a popular objective function, and are guaranteed to converge. • That objective function relates to the probability of generating the images in V from the bases W and encodings H:

  10. NMF vs. PCA Experiments • Dataset: 10 classes of natural textures • Clouds, grass, ice, trees, sand, sky, etc. • 932 color images total • Each image tessellated into 10x10 patches • 1000 patches for training, 1000 for testing • Each patch classified as a single texture • Raw feature vectors: Color histograms • Each region histogrammed into 8 bins per color, 16 colors  512 dimensional vectors

  11. NMF vs. PCA Experiments • Learn both NMF and PCA subspaces for each class of histogram • For both NMF and PCA: • Project queries onto the learned subspaces of each class • Label each query by the subspace that best reconstructs the query • This seems like a poor scheme for NMF • (Other experiments allow better schemes)

  12. NMF vs. PCA Results • NMF works best for dispersed classes • PCA works best for compact classes • Both seem useful…try combining them • But, why are less than half of the sky vectors best reconstructed by PCA when for sky PCA has a mean reconstruction error less than 1/4 that of NMF? Mistakes?

  13. NMF+PCA Experiments • During training, we learned whether NMF or PCA worked best for each class • Project a query to a class using only the method that works best for that class • Result: 2.3% improvement in the recognition rate over NMF alone (PCA: 5.8%), but is this significant at 60%?

  14. Hierarchy Experiments • At level k of the hierarchy, project the query onto each original class’ NMF or PCA subspace • But, to choose the direction to descend the hierarchy, we only care about the level k super-class containing the matching class • Furthermore, for each class the choice of PCA vs. NMF can be independently set at each level of the hierarchy

  15. Hierarchy Results • 2% improvement in recognition rate • I really suspect that this is insignificant, and resulting only from the additional degrees of freedom • They employ various additional neighborhood-based hacks to increase their accuracy further, but I don’t see any relevance to NMF specifically

  16. Need for a better metric • Want to classify based on nearest neighbor, rather than reprojection error • Unfortunately, NMF generates a nonorthonormal basis, and so the relative distance to a base depends on the uniqueness of that base • Bases will share a lot of pixels in common areas

  17. Earth Movers Distance (EMD) • Defined as the minimal amount of “work” that must be performed to transform one feature distribution into the other • A special case of the “transportation problem” from linear optimization • Let I=set of suppliers, J=set of consumers, cij=cost to ship from I to J, fij=amount shipped from I to J • Distance = cost to make datasets equal

  18. Earth Movers Distance (EMD) • Based on finding a measure of correlation between bases to define its cost matrix • The cost matrix weights the transition of one basis (bi) to another (bj) • cij = distangle(bi,bj) = -( x • y )/( ||x|| ||y|| )

  19. EMD: Transportation Problem • fij = quant. shipped from ij • Consumers don’t ship • Don’t exceed demand • Don’t exceed supply • Demand must equal supply for EMD to be a metric

  20. EMD vs. “Other” Experiments • Digit recognition from MNIST digit database • 60,000 training images + 10,000 for test • Classify by NN and 5NN in the subspace • Result: EMD works best in low-dimensional subspaces, but in high-dimensional subspaces EMD does not work well • More specificly, EMD works well when the bases contain some intersecting pixels

  21. Occlusion Experiments • Randomly occlude either 1 or 2 of the 4 quadrants of an image (25% and 50% occlusion) • Why does distangle do so well?

  22. Demo • NMF difficulties • EMD experiments instead • Demonstrate using existing code within the desired framework of a cost matrix • Their code: http://robotics.stanford.edu/~rubner/emd/default.htm • My code: http://www.vialab.org/john/Pres9-code/

  23. Conclusion • NMF is a parts-based alternative to PCA • NMF and PCA should be combined for minimum-reprojection-error classification • For nearest-neighbor classification, NMF needs a better metric • When the subspace dimensionality is chosen appropriately for good bases, NMF+EMD or NMF+distangle have the highest recognition rates

More Related