230 likes | 588 Views
Nonnegative Matrix Factorization with Sparseness Constraints. S. Race MA591R. Introduction to NMF. Factor A = WH A – matrix of data m non-negative scalar variables n measurements form the columns of A W – m x r matrix of “basis vectors” H – r x n coefficient matrix
E N D
Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R
Introduction to NMF • Factor A = WH • A – matrix of data • m non-negative scalar variables • n measurements form the columns of A • W – m x r matrix of “basis vectors” • H – r x n coefficient matrix • Describes how strongly each building block is present in measurement vectors
Introduction to NMF con’t • Purpose: • “parts-based” representation of the data • Data compression • Noise reduction • Examples: • Term – Document Matrix • Image processing • Any data composed of hidden parts
Introduction to NMF con’t • Optimize accuracy of solution: • min || A-WH ||F where W,H ≥ 0 • We can drop nonnegative constraints • min || A-(W.W)(H.H) || • Many options for objective function • Many options for algorithm • W,H will depend on initial choices • Convergence is not always guaranteed
Common Algorithms • Alternating Least Squares • Paatero 1994 • Multiplicative Update Rules • Lee-Seung 2000 Nature • Used by Hoyer • Gradient Descent • Hoyer 2004 • Berry-Plemmons 2004
Why is sparsity important? • Nature of some data • Text-mining • Disease patterns • Better Interpretation of Results • Storage concerns
Non-negative Sparse Coding I • Proposed by Patrik Hoyer in 2002 • Add a penalty function to the objective to encourage sparseness • OBJ: • Parameter λ controls trade-off between accuracy and sparseness • f is strictly increasing: f=Σ Hij works
Sparse Objective Function • The objective can always be decreased by scaling W up, H down • Set W= cW and H=(1/c)H • Thus, alone the objective will simply yield the NMF solution • Constraint on the scale of H or W is needed • Fix norm of columns of W or rows of H
Non-negative Sparse Coding I • Pros • Simple, efficient • Guaranteed to reach global minimum using multiplicative update rule • Cons • Sparseness controlled implicitly: Optimal λ found by trial and error • Sparseness only constrained for H
NMF with sparseness constraints II • First need some way to define the sparseness of a vector • A vector with one nonzero entry is maximally sparse • A multiple of the vector of all ones, e, is minimally sparse • CBS Inequality • How can we combine these ideas?
Hoyer’s Sparseness Parameter • sparseness(x)= • where n is the dimensionality of x • This measure indicates that we can control a vector’s sparseness by manipulating its L1 and L2 norms
Implementing Sparseness Constraints • Now that we have an explicit measure of sparseness, how can we incorporate it into the algorithm? • Hoyer: at each step, project each column of a matrix onto the nearest vector of desired sparseness.
Hoyer’s Projection Algorithm • Problem: Given any vector, x, find the closest (in the Euclidean sense) non-negative vector s with a given L1 norm and a given L2 norm • We can easily solve this problem in the 3 dimensional case and extend the result.
Hoyer’s Projection Algorithm • Set si=xi + (L1-Σxi)/n for all i • Set Z={} • Iterate • Set mi=L1/(n-size(Z)) if i in Z, 0 otherwise • Set s=m+β(s-m) where β≥0 solves quadratic • If s, non-negative we’re finished • Set Z=Z U {i : si <0} • Set si=0 for all i in Z • Calculate c=(Σsi – L1)/(n-size(Z)) • Set si=si-c for all i not in Z
The Algorithm in words • Project x onto hyperplane Σsi=L1 • Within this space, move radially outward from center of joint constraint hypersphere toward point • If result non-negative, destination reached • Else, set negative values to zero and project to new point in similar fashion
NMF with sparseness constraints • Step 1: Initialize W, H to random positive matrices • Step 2: If constraints apply to W or H or both, project each column or row respectively to have unchanged L2 norm and desired L1 norm
NMF w/ Sparseness Algorithm • Step 3: Iterate • If sparseness constraints on W apply, • Set W=W-μw(WH-A)HT • Project columns of W as in step 2 • Else, take standard multiplicative step • If sparseness constraints on H apply • Set H=H- μHWT(WH-A) • Project rows of H as in step 2 • Else, take standard multiplicative step
Advantages of New Method • Sparseness controlled explicitly with a parameter that is easily interpretted • Sparseness of W, H or both can be constrained • Number of iterations required grows very slowly with the dimensionality of the problem
Dotted Lines Represent Min and Max Iterations Solid Line shows average number required
Text Mining Results • Text to Matrix Generator • Dimitrios Zeimpekis and E. Gallopoulos • University of Patras • http://scgroup.hpclab.ceid.upatras.gr/scgroup/Projects/TMG/ • NMF with sparseness constraints from Hoyer’s web page