280 likes | 297 Views
1/20. Structured Low-Rank Matrix Factorization: global optimality, Algorithms, and Applications. Article by Benjamin D. Haeffele and René Vidal (2017). CMAP Machine Learning Journal Club Speaker: Imke Mayer CMAP, December 13 th 2018. 2/20. Outline. Structured Matrix Factorization
E N D
1/20 Structured Low-Rank Matrix Factorization: global optimality, Algorithms, and Applications Article by Benjamin D. Haeffele and René Vidal (2017) CMAP Machine Learning Journal Club Speaker: Imke Mayer CMAP, December 13th 2018
2/20 Outline • Structured Matrix Factorization • Context and definition • Special case 1: Sparse dictionary learning (SDL) • Special case 2: Subspace clustering (SC) • Global optimality for structured matrix factorization • Main theorem • Polar problem • Application: SDL global optimality • Extension to tensor factorization and deep learning CMAP Machine Learning Journal Club, December 13th 2018
3/20 Structured Matrix Factorization Context • (Large) high-dimensional datasets (images, videos, user ratings, etc.) • difficult to assess (computational issues, memory complexity) • but relevant information often lies in a low-dimensional structure • Goal: recover this underlying low-dimensional structure of given (large scale) data X Motion segmentation Face clustering CMAP Machine Learning Journal Club, December 13th 2018 [12] VIDAL, R., MA, Y., AND SASTRY, S. S. Generalized principal component analysis, vol. 5. Springer, 2016.
4/20 Structured Matrix Factorization Context • Large high-dimensional datasets (images, videos, user ratings, etc.) • difficult to assess (computational issues, memory complexity) • but relevant information often lies in general low-dimensional structure • Goal: recover this underlying low-dimensional structure of given (large scale) data X • Model assumption: linear subspace model. The data can be approximated by one ore more low-dimensional subspace(s). Basis of the linear low-dimensional structure Low-dimensional data representation CMAP Machine Learning Journal Club, December 13th 2018
4/20 Structured Matrix Factorization Context Basis of the linear low-dimensional structure Low-dimensional data representation • Issue: Without any assumptions there are infinitely many choices for U and V such that X UVT. • Solution: Constrain the factors to satisfy certain properties. (1) • Non-convex • Structured factors more modeling flexibility • Explicit representation Loss: measures the approximation Regularization: imposes restrictions on the factors CMAP Machine Learning Journal Club, December 13th 2018
5/20 Structured Matrix FactorizationSpecial Case 1: Sparse Dictionary Learning • Given a set of signals, find a set of dictionary atoms and sparse codes to approximate the signals. [9] • denoising, inpainting • classification Sparse linear combinations of dictionary atoms Denoised image Noisy image Dictionary atoms dictionary signals (3) sparse codes CMAP Machine Learning Journal Club, December 13th 2018 [9] OLSHAUSEN, B. A., AND FIELD, D. J. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research 37, 23 (1997), 3311–3325.
6/20 Structured Matrix FactorizationSpecial Case 1: Sparse Dictionary Learning dictionary signals (3) sparse codes • Challenges: • Optimization strategies without global convergence guarantees • Which size for U and V? Need to pick r (number of columns) a priori (4) CMAP Machine Learning Journal Club, December 13th 2018
7/20 Structured Matrix FactorizationSpecial Case 2: Subspace Clustering • Given data X coming from a union of subspaces, find these underlying subspaces and separate the data according to these subspaces. • clustering • recover low-dimensional structures CMAP Machine Learning Journal Club, December 13th 2018
8/20 Structured Matrix FactorizationSpecial Case 2: Subspace Clustering • Given data X coming from a union of subspaces, determine these underlying subspaces and separate the data according to these subspaces. • clustering • recover low-dimensional structures Subspaces S1,..., Sncharacterized by bases U Segmentation by finding a subspace-preserving representation V recover number and dimensions of the subspaces recover data segmentation • Challenges: • Model selection: how many subspaces? Dimension of each subspace? • Potentially: difficult subspace configurations CMAP Machine Learning Journal Club, December 13th 2018
9/20 Structured Matrix FactorizationSpecial Case 2: Subspace Clustering • One solution to do subspace clustering: Sparse Subspace Clustering [4] • Self-expressive dictionary: fix the dictionary as U X • Find a sparse representation over U which allows to segment the data. • But optimality of the dictionary is not addressed. • Idea: Sparse dictionary learning on union of subspaces model is suited to recover a more compact factorization with subspace-sparse codes. [1] [1] ADLER, A., ELAD, M., AND HEL-OR, Y. Linear-time subspace clustering via bipartite graph modeling. IEEE transactions on neural networks and learning systems 26, 10 (2015), 2234–2246. [4] ELHAMIFAR, E., AND VIDAL, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence 35, 11 (2013), 2765–2781.
10/20 Structured Matrix Factorization Theory for Global Optimality ? Matrix factorization Matrix approximation (1) (2) • Non-convex • Small problem size • Structured factors more modeling flexibility • Explicit representation • Convex • Large problem size • Unstructured Low-rank matrix factorization Low-rank matrix approximation CMAP Machine Learning Journal Club, December 13th 2018
10/20 Structured Matrix Factorization Theory for Global Optimality ? Matrix factorization Matrix approximation (1) (2) • Non-convex • Small problem size • Structured factors more modeling flexibility • Explicit representation • Convex • Large problem size • Unstructured • Ideas: • Find a convex relaxation for general a regularization function to couple the two problems (1) and (2). • Allow the number of columns of U and V to change in (1). • Results: • Problem (2) gives a global lower-bound to problem (1). • This convex lower-bound allows to analyze global optimality for problem (1).
11/20 Global Optimality of Structured Matrix FactorizationAt a Local minimum • Assumptions: • Factorization size r is allowed to change. • Loss is convex and once differentiable w.r.t. Y. • is a sum of positively homogeneous functions of degree 2. THEOREM [6] Local minima of are globally optimal if for some . All local minima of of sufficient size are global minima. [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).
12/20 Global Optimality of Structured Matrix FactorizationAt ANY point • Assumptions: • Factorization size r is allowed to change. • Loss is convex and once differentiable w.r.t. Y. • is a sum of positively homogeneous functions of degree 2. • COROLLARY [6] • A point is a global optimum of if it satisfies the following conditions: • for many choices of condition 1 is satisfied by first order optimal points [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).
12/20 Global Optimality of Structured Matrix FactorizationAt ANY point • Assumptions: • Factorization size r is allowed to change. • Loss is convex and once differentiable w.r.t. Y. • is a sum of positively homogeneous functions of degree 2. • COROLLARY [6] • Given a point we can test whether it is a local minimum and of sufficient size by testing: (5) [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).
13/20 Global Optimality of Structured Matrix FactorizationAt ANY point • COROLLARY [6] • Given a point we can test whether it is a local minimum and of sufficient size by testing: (5) (6) Directional derivative of f in direction (u,v) (7) where polar problem [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).
14/20 Polar Problem (7) • Why are we interested in solving this polar problem? • For a non-convex problem, first-order optimality is not sufficient to guarantee local minimality. • Theorem on global optimality only applies to local minima. • Polar problem allows to test global optimality at any (critical) point. • It is a higher-order non-smooth saddle point problem. • The difficulty of solving the polar depends on the choice of . • is the top eigenvalue of ZTZ. • is the largest entry (in absolute value) of Z. • Solving this polar problem is NP-hard [7]. [7] HENDRICKX, J. M., AND OLSHEVSKY, A. Matrix p-norms are np-hard to approximate if p1,2,. SIAM Journal on Matrix Analysis and Applications 31, 5 (2010), 2802–2812.
15/20 Elements of Proof (Theorem) • Rank-1 regularizer: • Positively homogeneous with degree 2 • Positive semi-definite • Well-defined as a regularizer for rank-1 matrices • Matrix factorization regularizer: • Generalization of decomposition/atomic norm. • Convex function. • The infimum is achieved with r DN. (8) CMAP Machine Learning Journal Club, December 13th 2018
15/20 Elements of Proof (THEOREM) • Rank-1 regularizer: • Matrix factorization regularizer: • Generalization of decomposition/atomic norm. • Convex function. • The infimum is achieved with r DN. • Example: variational form of the nuclear norm (sum of the singular values of the given matrix) . (8) CMAP Machine Learning Journal Club, December 13th 2018
16/20 Elements of Proof (THEOREM) (8) • Matrix factorization regularizer • Convex optimization problem • Non-convex factorized formulation • If is an optimal solution to problem (9), then any factorization such that is also an optimal solution to the non-convex problem (10). • Idea of the proof: Local minima of that satisfy the conditions of the theorem also satisfy conditions for global optimality of F. (9) (10) CMAP Machine Learning Journal Club, December 13th 2018
17/20 Additional comments on the Polar Problem • Matrix factorization regularizer • Polar function • Given a FO optimal point of the non-convex problem, the value of the polar problem solution at this point provides a bound on how far this point is from being globally optimal. (8) (11) (12) CMAP Machine Learning Journal Club, December 13th 2018
18/20 Algorithm for sparse dictionary learning • Instantiation of Structured MF Meta-Algorithm • Build upon COROLLARY (on global optimality at any given point). • Alternates between: • local descent to a critical point • evaluation of the polar function (by solving the polar problem) in order to test for global optimality at this critical point • augmentation of the current factorization with the solution of the polar step (b) as long as the polar value is > 1. • From [6] we know that condition 2) of COROLLARY will hold for finite r. a b c [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).
19/20 Beyond Matrix Factorization Input features Non-linearity • Structured Tensor Factorization and Deep Learning. • Mapping: • Dimension: r = # number of columns in U and V • Optimization problem: • Positive homogeneity of the network architecture and parallel subnetwork structure. • Example of regularizer: product of norms. Weights r = # number of parallel subnetworks Figure from ICCV ‘17 tutorial on Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond (Ben Haeffele and René Vidal) CMAP Machine Learning Journal Club, December 13th 2018
20/20 Conclusion and future Work • Structured matrix factorization as a general formulation of many popular problems (LR MF, sparse PCA, NMF, SDL, etc.) • Global optimality of structured matrix factorization. • Iterative algorithm using local descent and polar problem to reach global optimum. • Limits of direct applicability: polar problem optimization landscape can be complicated. • Extendable to the analysis of global optimality for deep learning problems. • Further reading • HAEFFELE, B. D., AND VIDAL, R. Global optimality in tensor factorization, deep learning, and beyond. arXiv preprint arXiv:1506.07540 (2015). • BACH, F. Convex relaxations of structured matrix factorizations. arXiv preprint arXiv:1309.3117 (2013). • SCHWAB, E., HAEFFELE, B., CHARON, N., AND VIDAL, R. Separable dictionary learning with global optimality and applications to diffusion mri. arXiv preprint arXiv:1807.05595 (2018). CMAP Machine Learning Journal Club, December 13th 2018
20/20 Thank you For Your Attention CMAP, December 13th 2018 Presented article: Haeffele, B.D. and Vidal, R. (2017). Structured Low-Rank Matrix Factorization: Global optimality, Algorithms, and Applications. URL: https://arxiv.org/abs/1708.07850
References I • ADLER, A., ELAD, M., AND HEL-OR, Y. Linear-time subspace clustering via bipartite graph modeling. IEEE transactions on neural networks and learning systems 26, 10 (2015), 2234–2246. • BACH, F. Convex relaxations of structured matrix factorizations. arXiv preprint arXiv:1309.3117 (2013). • BACH, F., MAIRAL, J., AND PONCE, J. Convex sparse matrix factorizations. arXiv preprint arXiv:0812.1869 (2008). • ELHAMIFAR, E., AND VIDAL, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence 35, 11 (2013), 2765–2781. • HAEFFELE, B. D., AND VIDAL, R. Global optimality in tensor factorization, deep learning, and beyond. arXiv preprint arXiv:1506.07540 (2015). • HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017). • HENDRICKX, J. M., AND OLSHEVSKY, A. Matrix p-norms are np-hard to approximate if p1,2,. SIAM Journal on Matrix Analysis and Applications 31, 5 (2010), 2802–2812. • LIU, G., LIN, Z., AND YU, Y. Robust subspace segmentation by low-rank representation. In Proceedings of the 27th international conference on machine learning (ICML-10) (2010), pp. 663–670.
References II • OLSHAUSEN, B. A., AND FIELD, D. J. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research 37, 23 (1997), 3311–3325. • SCHWAB, E., HAEFFELE, B., CHARON, N., AND VIDAL, R. Separable dictionary learning with global optimality and applications to diffusion mri. arXiv preprint arXiv:1807.05595 (2018). • SUN, J., QU, Q., AND WRIGHT, J. Complete dictionary recovery over the sphere. arXiv preprint arXiv:1504.06785 (2015). • VIDAL, R., MA, Y., AND SASTRY, S. S. Generalized principal component analysis, vol. 5. Springer, 2016. • ZHU, Z., LI, Q., TANG, G., AND WAKIN, M. B. Global optimality in low-rank matrix optimization. IEEE Transactions on Signal Processing 66, 13 (2018), 3614–3628.