1 / 27

Structured Low-Rank Matrix Factorization: global optimality, Algorithms, and Applications

1/20. Structured Low-Rank Matrix Factorization: global optimality, Algorithms, and Applications. Article by Benjamin D. Haeffele and René Vidal (2017). CMAP Machine Learning Journal Club Speaker: Imke Mayer CMAP, December 13 th 2018. 2/20. Outline. Structured Matrix Factorization

gnatalie
Download Presentation

Structured Low-Rank Matrix Factorization: global optimality, Algorithms, and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1/20 Structured Low-Rank Matrix Factorization: global optimality, Algorithms, and Applications Article by Benjamin D. Haeffele and René Vidal (2017) CMAP Machine Learning Journal Club Speaker: Imke Mayer CMAP, December 13th 2018

  2. 2/20 Outline • Structured Matrix Factorization • Context and definition • Special case 1: Sparse dictionary learning (SDL) • Special case 2: Subspace clustering (SC) • Global optimality for structured matrix factorization • Main theorem • Polar problem • Application: SDL global optimality • Extension to tensor factorization and deep learning CMAP Machine Learning Journal Club, December 13th 2018

  3. 3/20 Structured Matrix Factorization Context • (Large) high-dimensional datasets (images, videos, user ratings, etc.) • difficult to assess (computational issues, memory complexity) • but relevant information often lies in a low-dimensional structure • Goal: recover this underlying low-dimensional structure of given (large scale) data X Motion segmentation Face clustering CMAP Machine Learning Journal Club, December 13th 2018 [12] VIDAL, R., MA, Y., AND SASTRY, S. S. Generalized principal component analysis, vol. 5. Springer, 2016.

  4. 4/20 Structured Matrix Factorization Context • Large high-dimensional datasets (images, videos, user ratings, etc.) • difficult to assess (computational issues, memory complexity) • but relevant information often lies in general low-dimensional structure • Goal: recover this underlying low-dimensional structure of given (large scale) data X • Model assumption: linear subspace model. The data can be approximated by one ore more low-dimensional subspace(s). Basis of the linear low-dimensional structure Low-dimensional data representation CMAP Machine Learning Journal Club, December 13th 2018

  5. 4/20 Structured Matrix Factorization Context Basis of the linear low-dimensional structure Low-dimensional data representation • Issue: Without any assumptions there are infinitely many choices for U and V such that X UVT. • Solution: Constrain the factors to satisfy certain properties. (1) • Non-convex • Structured factors more modeling flexibility • Explicit representation Loss: measures the approximation Regularization: imposes restrictions on the factors CMAP Machine Learning Journal Club, December 13th 2018

  6. 5/20 Structured Matrix FactorizationSpecial Case 1: Sparse Dictionary Learning • Given a set of signals, find a set of dictionary atoms and sparse codes to approximate the signals. [9] • denoising, inpainting • classification Sparse linear combinations of dictionary atoms Denoised image Noisy image Dictionary atoms dictionary signals (3) sparse codes CMAP Machine Learning Journal Club, December 13th 2018 [9] OLSHAUSEN, B. A., AND FIELD, D. J. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research 37, 23 (1997), 3311–3325.

  7. 6/20 Structured Matrix FactorizationSpecial Case 1: Sparse Dictionary Learning dictionary signals (3) sparse codes • Challenges: • Optimization strategies without global convergence guarantees • Which size for U and V? Need to pick r (number of columns) a priori (4) CMAP Machine Learning Journal Club, December 13th 2018

  8. 7/20 Structured Matrix FactorizationSpecial Case 2: Subspace Clustering • Given data X coming from a union of subspaces, find these underlying subspaces and separate the data according to these subspaces. • clustering • recover low-dimensional structures CMAP Machine Learning Journal Club, December 13th 2018

  9. 8/20 Structured Matrix FactorizationSpecial Case 2: Subspace Clustering • Given data X coming from a union of subspaces, determine these underlying subspaces and separate the data according to these subspaces. • clustering • recover low-dimensional structures Subspaces S1,..., Sncharacterized by bases U Segmentation by finding a subspace-preserving representation V recover number and dimensions of the subspaces recover data segmentation • Challenges: • Model selection: how many subspaces? Dimension of each subspace? • Potentially: difficult subspace configurations CMAP Machine Learning Journal Club, December 13th 2018

  10. 9/20 Structured Matrix FactorizationSpecial Case 2: Subspace Clustering • One solution to do subspace clustering: Sparse Subspace Clustering [4] • Self-expressive dictionary: fix the dictionary as U X • Find a sparse representation over U which allows to segment the data. • But optimality of the dictionary is not addressed. • Idea: Sparse dictionary learning on union of subspaces model is suited to recover a more compact factorization with subspace-sparse codes. [1] [1] ADLER, A., ELAD, M., AND HEL-OR, Y. Linear-time subspace clustering via bipartite graph modeling. IEEE transactions on neural networks and learning systems 26, 10 (2015), 2234–2246. [4] ELHAMIFAR, E., AND VIDAL, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence 35, 11 (2013), 2765–2781.

  11. 10/20 Structured Matrix Factorization Theory for Global Optimality ? Matrix factorization Matrix approximation (1) (2) • Non-convex • Small problem size • Structured factors more modeling flexibility • Explicit representation • Convex • Large problem size • Unstructured Low-rank matrix factorization Low-rank matrix approximation CMAP Machine Learning Journal Club, December 13th 2018

  12. 10/20 Structured Matrix Factorization Theory for Global Optimality ? Matrix factorization Matrix approximation (1) (2) • Non-convex • Small problem size • Structured factors more modeling flexibility • Explicit representation • Convex • Large problem size • Unstructured • Ideas: • Find a convex relaxation for general a regularization function to couple the two problems (1) and (2). • Allow the number of columns of U and V to change in (1). • Results: • Problem (2) gives a global lower-bound to problem (1). • This convex lower-bound allows to analyze global optimality for problem (1).

  13. 11/20 Global Optimality of Structured Matrix FactorizationAt a Local minimum • Assumptions: • Factorization size r is allowed to change. • Loss is convex and once differentiable w.r.t. Y. • is a sum of positively homogeneous functions of degree 2. THEOREM [6] Local minima of are globally optimal if for some . All local minima of of sufficient size are global minima. [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).

  14. 12/20 Global Optimality of Structured Matrix FactorizationAt ANY point • Assumptions: • Factorization size r is allowed to change. • Loss is convex and once differentiable w.r.t. Y. • is a sum of positively homogeneous functions of degree 2. • COROLLARY [6] • A point is a global optimum of if it satisfies the following conditions: • for many choices of condition 1 is satisfied by first order optimal points [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).

  15. 12/20 Global Optimality of Structured Matrix FactorizationAt ANY point • Assumptions: • Factorization size r is allowed to change. • Loss is convex and once differentiable w.r.t. Y. • is a sum of positively homogeneous functions of degree 2. • COROLLARY [6] • Given a point we can test whether it is a local minimum and of sufficient size by testing: (5) [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).

  16. 13/20 Global Optimality of Structured Matrix FactorizationAt ANY point • COROLLARY [6] • Given a point we can test whether it is a local minimum and of sufficient size by testing: (5) (6) Directional derivative of f in direction (u,v) (7) where polar problem [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).

  17. 14/20 Polar Problem (7) • Why are we interested in solving this polar problem? • For a non-convex problem, first-order optimality is not sufficient to guarantee local minimality. • Theorem on global optimality only applies to local minima. • Polar problem allows to test global optimality at any (critical) point. • It is a higher-order non-smooth saddle point problem. • The difficulty of solving the polar depends on the choice of . • is the top eigenvalue of ZTZ. • is the largest entry (in absolute value) of Z. • Solving this polar problem is NP-hard [7]. [7] HENDRICKX, J. M., AND OLSHEVSKY, A. Matrix p-norms are np-hard to approximate if p1,2,. SIAM Journal on Matrix Analysis and Applications 31, 5 (2010), 2802–2812.

  18. 15/20 Elements of Proof (Theorem) • Rank-1 regularizer: • Positively homogeneous with degree 2 • Positive semi-definite • Well-defined as a regularizer for rank-1 matrices • Matrix factorization regularizer: • Generalization of decomposition/atomic norm. • Convex function. • The infimum is achieved with r DN. (8) CMAP Machine Learning Journal Club, December 13th 2018

  19. 15/20 Elements of Proof (THEOREM) • Rank-1 regularizer: • Matrix factorization regularizer: • Generalization of decomposition/atomic norm. • Convex function. • The infimum is achieved with r DN. • Example: variational form of the nuclear norm (sum of the singular values of the given matrix) . (8) CMAP Machine Learning Journal Club, December 13th 2018

  20. 16/20 Elements of Proof (THEOREM) (8) • Matrix factorization regularizer • Convex optimization problem • Non-convex factorized formulation • If is an optimal solution to problem (9), then any factorization such that is also an optimal solution to the non-convex problem (10). • Idea of the proof: Local minima of that satisfy the conditions of the theorem also satisfy conditions for global optimality of F. (9) (10) CMAP Machine Learning Journal Club, December 13th 2018

  21. 17/20 Additional comments on the Polar Problem • Matrix factorization regularizer • Polar function • Given a FO optimal point of the non-convex problem, the value of the polar problem solution at this point provides a bound on how far this point is from being globally optimal. (8) (11) (12) CMAP Machine Learning Journal Club, December 13th 2018

  22. 18/20 Algorithm for sparse dictionary learning • Instantiation of Structured MF Meta-Algorithm • Build upon COROLLARY (on global optimality at any given point). • Alternates between: • local descent to a critical point • evaluation of the polar function (by solving the polar problem) in order to test for global optimality at this critical point • augmentation of the current factorization with the solution of the polar step (b) as long as the polar value is > 1. • From [6] we know that condition 2) of COROLLARY will hold for finite r. a b c [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).

  23. 19/20 Beyond Matrix Factorization Input features Non-linearity • Structured Tensor Factorization and Deep Learning. • Mapping: • Dimension: r = # number of columns in U and V • Optimization problem: • Positive homogeneity of the network architecture and parallel subnetwork structure. • Example of regularizer: product of norms. Weights r = # number of parallel subnetworks Figure from ICCV ‘17 tutorial on Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond (Ben Haeffele and René Vidal) CMAP Machine Learning Journal Club, December 13th 2018

  24. 20/20 Conclusion and future Work • Structured matrix factorization as a general formulation of many popular problems (LR MF, sparse PCA, NMF, SDL, etc.) • Global optimality of structured matrix factorization. • Iterative algorithm using local descent and polar problem to reach global optimum. • Limits of direct applicability: polar problem optimization landscape can be complicated. • Extendable to the analysis of global optimality for deep learning problems. • Further reading • HAEFFELE, B. D., AND VIDAL, R. Global optimality in tensor factorization, deep learning, and beyond. arXiv preprint arXiv:1506.07540 (2015). • BACH, F. Convex relaxations of structured matrix factorizations. arXiv preprint arXiv:1309.3117 (2013). • SCHWAB, E., HAEFFELE, B., CHARON, N., AND VIDAL, R. Separable dictionary learning with global optimality and applications to diffusion mri. arXiv preprint arXiv:1807.05595 (2018). CMAP Machine Learning Journal Club, December 13th 2018

  25. 20/20 Thank you For Your Attention CMAP, December 13th 2018 Presented article: Haeffele, B.D. and Vidal, R. (2017). Structured Low-Rank Matrix Factorization: Global optimality, Algorithms, and Applications. URL: https://arxiv.org/abs/1708.07850

  26. References I • ADLER, A., ELAD, M., AND HEL-OR, Y. Linear-time subspace clustering via bipartite graph modeling. IEEE transactions on neural networks and learning systems 26, 10 (2015), 2234–2246. • BACH, F. Convex relaxations of structured matrix factorizations. arXiv preprint arXiv:1309.3117 (2013). • BACH, F., MAIRAL, J., AND PONCE, J. Convex sparse matrix factorizations. arXiv preprint arXiv:0812.1869 (2008). • ELHAMIFAR, E., AND VIDAL, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence 35, 11 (2013), 2765–2781. • HAEFFELE, B. D., AND VIDAL, R. Global optimality in tensor factorization, deep learning, and beyond. arXiv preprint arXiv:1506.07540 (2015). • HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017). • HENDRICKX, J. M., AND OLSHEVSKY, A. Matrix p-norms are np-hard to approximate if p1,2,. SIAM Journal on Matrix Analysis and Applications 31, 5 (2010), 2802–2812. • LIU, G., LIN, Z., AND YU, Y. Robust subspace segmentation by low-rank representation. In Proceedings of the 27th international conference on machine learning (ICML-10) (2010), pp. 663–670.

  27. References II • OLSHAUSEN, B. A., AND FIELD, D. J. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research 37, 23 (1997), 3311–3325. • SCHWAB, E., HAEFFELE, B., CHARON, N., AND VIDAL, R. Separable dictionary learning with global optimality and applications to diffusion mri. arXiv preprint arXiv:1807.05595 (2018). • SUN, J., QU, Q., AND WRIGHT, J. Complete dictionary recovery over the sphere. arXiv preprint arXiv:1504.06785 (2015). • VIDAL, R., MA, Y., AND SASTRY, S. S. Generalized principal component analysis, vol. 5. Springer, 2016. • ZHU, Z., LI, Q., TANG, G., AND WAKIN, M. B. Global optimality in low-rank matrix optimization. IEEE Transactions on Signal Processing 66, 13 (2018), 3614–3628.

More Related