140 likes | 159 Views
Introducing learning algorithms based on convex and conic encoders, less constrained than VQ but more than PCA. Learn about affine, convex, and conic hulls, constraints, encoding, and classification processes. Experiment setup, classification results, and insightful discussions included.
E N D
Unsupervised Learning by Convex and Conic Coding D. D. Lee and H. S. Seung NIPS’1997
Introduction • Learning algorithms based on convex and conic encoders are introduced. • Less constrained than VQ but more constrained than PCA. • VQ • Encode each input as the index of the closest prototype. • Capture nonlinear structure • Highly localized • PCA • Encode as the coefficient of a linear superposition of a set of basis vectors. • Distributed representation • Can only model linear structures.
Can produce sparse distributed representation. • Learning algorithms can be understood as approximate matrix factorization.
Affine, Convex, Conic, and Point Coding • Definition • Given a set of basis vectors , the linear combination is called • Affine hull • Convex hull • Conic hull
Goal of encoding • Find the nearest point to the input in the respective hulls. • Minimize the reconstruction error • Encoding of convex, conic encoders • Sparse encoding • Contain coefficients that vanish, due to the nonnegativity constraints in the optimization.
Learning • Objective Function • X : nm matrix of training set • n : dimension, m: number of data • W : nr matrix (basis vectors) • V : rm matrix (encodings) • Description • Approximate factorization of the data matrix X into a matrix W of basis vectors and a matrix V of code vectors.
Constraints • If the input vectors in X have been scaled to the range [0, 1], the constraints on the optimizations are given by • The nonnegativity constraints prevent cancellations from occurring in the linear combinations.
Example: modeling handwritten digits • Experimental Setup • Affine, Convex, Conic, and VQ learning to the USPS database. • Handwritten digits segmented from actual zip codes. • 7291 training and 2007 test images were normalized to a 16 16 grid with pixel intensities in the range [0, 1]. • Training examples were segregated by digit class. • Separate basis vectors were trained for each of the classes using four encodings.
VQ • k-means algorithm was used • Restarted with various initial conditions and the best solution was chosen. • Affine • Determine the affine space that best models the input data. • No obvious interpretation. • Convex • Finds the r basis vectors whose convex hull best fits the input data. • Alternate between projected gradient steps of W and V. • The basis vectors are interpretable as templates and are less blurred than those found in VQ. • Eliminate many invariant transformations, because they would violate the nonnegativity constraints.
Conic • Finds basis vectors whose conic hull best models the input images. • Representation allows combinations of basis vectors. • The basis vectors found are features rather than templates.
Classification • Separately reconstructing the test images with different digit model. • Associate the image with the model having the smallest reconstruction error.
Results • With r=25 patterns per digit class • Convex : error rate = 113/2007 = 5.6% • With r=100 patterns, 89/2007 = 4.4% • Conic: 138/2007 = 6.8% • With r > 50, worse performance as the feature shrink to small spots. • Non-trivial correlations still remain in the and also need to be taken into account.
Discussion • Convex coding is similar to other locally linear models. • Conic coding is similar to the noisy OR and harmonium models • Conic uses continuous variables rather than binary variables. • Makes the encoding computationally tractable and allows for interpolation between basis vectors. • Convex and Conic coding is can be viewed as probabilistic latent variable models. • No explicit model P(va) for the hidden variables was used. • Limit the quality of the Conic models • Building hierarchical representations is needed.