The Diagonalized Newton Algorithm for Non-negative Matrix Factorization

The Diagonalized Newton Algorithm for Non-negative Matrix Factorization Hugo Van Hamme Reporter: Yi-Ting Wang 2013/3/26

Outline • Introduction • NMF formulation • The Diagonalized Newton Algorithm for KLD-NMF • Experiments • Conclusions

Introduction • Non-negative matrix factorization(NMF) denotes, • In this paper, the metric to measure the closeness of reconstruction to its target is measured by their Kullback-Leibler divergence： (1)

Introduction • The multiplicative updates(MU) algorithm solves exactly this problem in an iterative manner. • Its simplicity and the availability of many implementations make it a popular algorithm to date to solve NMF problems. • There are some drawbacks. • Firstly, it only converges locally and is not guaranteed to yield the global minimum of the cost function. It is hence sensitive to the choice of the initial guesses for and . • Secondly, MU is very slow to converge. The goal of this paper is to speed up the convergence while the local convergence property is retained.

Introduction • The resulting Diagonalized Newton Algorithm (DNA) is proposed showing faster convergence while the implementation remains simple and suitable for high-rank problems. • http://zh.wikipedia.org/wiki/File:NewtonIteration_Ani.gif

NMF formulation • To induce sparsity on the matrix factors, the KL-divergence is often regularized, i.e. one seeks to minimize： (2) • Minimizing (2) can be achieved by alternating updates of and for which cost is non-increasing. The updates for this form of block coordinate descent are： (3) (4)

NMF formulation • Let denote any column of and let denote the corresponding column of , then the following is the core minimization problem to be considered： (5)where 1 denotes a vector of ones of appropriate length. The solution of (5) should satisfy the KKT conditions, i.e. for all with

NMF formulation • If , the partial derivative is positive. • Hence the product of and the partial derivative is always zero for a solution of (5), i.e. for (6) • Since -columns with all-zero do not contribute to , it can be assumed that column sums of are not non-zero, so the above can be recast as：

NMF formulation • Where . To facilitate the derivations below, the following notations are introduced： (7) • Which are functions of via . The KKT conditions are hence recast as for (8) • Finally, summing (6) over yields (9) • Which is satisfied for and guess by renormalizing： (10)

Multiplicative updates • For KL-divergence, MU are identical to a fixed point update of (6) (11) • Update (11) has two fixed points：and . • In the former case, the KKT conditions imply that is negative.

Newton updates • To find the stationary points of (2), equations (8) need to be solved for . Newton’s update then states： with (12) • Applied to equations (8)： (13)

Newton updates • To avoid the matrix inversion in update (12), the last term in (13) is diagonalized, which is equivalent to solving the r-th equation in (8) for with all other components fixed. With (14) • Which is always positive, and element-wise Newton update for is obtained： (15)

Step size limitation • To respect nonnegativity and to avoid the sigularity, it is bounded below by a function with the same local behavior around zero： (16) • Hence, if , the following update is used： (17)

Non-increase of the cost • Despite section 2.3 (size limitation), the divergence can still increase. • A very safe option is to compute the EM update. • If the EM update is be better, the Newton update is rejected and the EM update is taken instead. • This will guarantee non-increase of the cost function. • The computational cost of this operation is dominated by evaluating the KL-divergence, not in computing the update itself.

The Diagonalized Newton Algorithm for KLD-NMF

Experiments-Dense data matrices

Experiments-Sparse data matrices

Conclusions • Depending on the case and matrix sizes, DNA iterations are 2 to 3 times slower than MU iterations. • In most cases, the diagonal approximation is good enough such that faster convergence is observed and a net gain results. • Since Newton updates can in general not ensure monotonic decrease of the cost function, the step size was controlled with a brute force strategy of falling back to MU in case the cost is increased. • More refined step damping methods could speed up DNA by avoiding evaluations of the cost function, which is next on the research agenda.

The Diagonalized Newton Algorithm for Non-negative Matrix Factorization

The Diagonalized Newton Algorithm for Non-negative Matrix Factorization

Presentation Transcript

NON-NEGATIVE MATRIX FACTORIZATION FOR REAL TIME MUSICAL ANALYSIS AND SIGHT-READING EVALUATION

Non-Negative Matrix Factorization

Non-negative Matrix Factorization

Pre-processing HCS data using Non-negative Matrix Factorization

Non-negative matrix factorization with Gaussian process priors

Shifted Non-negative Matrix Factorization

Matrix Factorization

Initialization enhancer for non-negative matrix factorization

Non Negative Matrix Factorization

Stochastic Matrix Factorization

Non-negative Matrix Factorization with Sparseness Constraints

Tuning Pruning in Sparse Non-negative Matrix Factorization

Evaluation of Distance Metrics for Recognition Based on Non-Negative Matrix Factorization

Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data

Illumination Estimation via Non-Negative Matrix Factorization

Position-dependent motif characterization using Non-negative matrix Factorization (NMF)

Matrix Factorization

Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection

Matrix Factorization

Extensions of Non-negative Matrix Factorization to Higher Order data

Non-negative Matrix Factorization