220 likes | 383 Views
Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding. Amirreza Shaban. Outline. Feature Learning Coding methods Vector Quantization Sparse Coding Local Coordinate Coding Locality-constrained Linear Coding Local Similarity Global Coding
E N D
Nonlinear Unsupervised Feature LearningHow Local Similarities Lead to Global Coding • AmirrezaShaban
Outline • Feature Learning • Coding methods • Vector Quantization • Sparse Coding • Local Coordinate Coding • Locality-constrained Linear Coding • Local Similarity Global Coding • Experiments Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 2
Feature Learning • The goal of feature learning is to convert a complex high dimensional nonlinear learning problem into a much simpler linear one. • Learned features capture the nonlinearity of the data structure in a way that the problem can be solved by a much easier linear learning method. • A topic very close to nonlinear dimensionality reduction. Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 3
Feature Learning Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 4
Coding Method • Coding methods are a class of algorithms aimed at finding high level representations of low level features. • Given unlabeled input data X= and codebook C = of m atoms, the goal is to learn the coding vector where each element indicates the affinity of data point to the corresponding codebook atom. Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 5
Vector Quantization • Assign each data point to its nearest dictionary basis: • The dictionary bases are the cluster centers that are learned by K-means. Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 6
Vector Quantization R1 [1, 0, 0] R2 R3 [0, 1, 0] [0, 0, 1] Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 7
Sparse Coding • Each data point is represented by a linear combination of a small number of codebook atoms. • The coefficients are found by solving the following minimization problem: Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 8
Local Coordinate Coding • It is empirically seen that when coefficients corresponding to local bases are non-zero, sparse coding proves a better performance. • It is conclude that locality is more essential than sparsity. Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 9
Local Coordinate Coding • Learning Method: • It is proved that it can learn an arbitrary function on the manifold. • Rate of convergence only depends on the intrinsic dimensionality of the manifold, not d. Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 10
Locality-constrained Linear Coding • LCC has high computational cost and it is not suitable for large-scale learning problems. • LLC firstly, guarantees locality by incorporating only the k-nearest bases in the coding process and secondly, minimizes the reconstruction term on the local patches: Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 11
Locality-constrained method drawback • Incapable of representing similarity between non-neighbor points: Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 12
Locality-constrained method drawback • The SVM labeling function can be written as: • For those points which SVM fails to predict the label of x. Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 13
Local Similarity Global Coding • The idea is to propagate the coefficients along the data manifold: • When t = 1, is similar to recent locality-constrained coding methods. Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 14
Inductive LSGC • The Kernel function is computed as: • It is referred to as diffusion kernel of order t. • The similarity is high if x and y are connected to each other by many paths in the graph. • it is known that t controls the resolution at which we are looking at data • The computational cost is . • High computational cost: Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 15
Inductive LSGC • A two step process: • Projection: Find vector f, in which each element represents one step similarity between data point x and basis , i.e. . • Mapping: Propagate the one step similarities in f to the other bases by a (t-1)-step diffusion process. Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 16
Inductive LSGC • The coding coefficient of data point in base is defined as: • And overall coding can be shown as: Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 17
Inductive to Transductive convergence • p and q are related by: • converges to zero at the rate of . Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 18
Experiments Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 19
Experiments Nonlinear Unsupervised Feature Learning Nonlinear Unsupervised Feature Learning DML DML DML 20