180 likes | 187 Views
Learn how to visualize and transform large datasets, implement basic linear algebra operations, and connect them to neuronal models and brain function using Principal Components Analysis.
E N D
A principled way to principal components analysis Daniel Zysman Lecturer
Teaching activity objectives • Visualize large data sets. • Transform the data to aid in this visualization. • Clustering data. • Implement basic linear algebra operations. • Connect this operations to neuronal models and brain function.
Context for the activity • Homework Assignment in 9.40 Intro to neural Computation (Sophomore/Junior). • In-class activity 9.014 Quantitative Methods and Computational Models in Neuroscience (1st year PhD).
MNIST data set 28 by 28 pixels 8-bit gray scale images These images live in a 784 dimensional space http://yann.lecun.com/exdb/mnist/
One possible visualization There are more than 300000 possible pairwise pixel plots!!!
Is there a more principled way? • Represent the data in a new basis set. • Aids in visualization and potentially in clustering and dimensionality reduction. • PCA provides such a basis set by looking at directions that capture most variance. • The directions are ranked by decreasing variance. • It diagonalizes the covariance matrix.
Pedagogical approach • Guide them step by step to implement PCA. • Emphasize visualizations and geometrical approach/intuition. • We don’t use the MATLAB canned function for PCA. • We want students to get their hands “dirty”. This helps build confidence and deep understanding.
PCA Mantra • Reshape the data to proper format for PCA. • Center the data performing mean subtraction. • Construct the data covariance matrix. • Perform SVD to obtain the eigenvalues and eigenvectors of the covariance matrix. • Compute the variance explained per component and plot it. • Reshape the eigenvectors and visualize their images. • Project the mean subtracted data onto the eigenvectors basis.
The first two PCs capture ~37% of the variance. The data forms clear clusters that are almost linearly separable Projections onto the first 2 axes
Hebbian Learning • 1949 book: 'The Organization of Behavior' Theory about the neural bases of learning • Learning takes place at synapses. • Synapses get modified, they get stronger when the pre- and post- synaptic cells fire together. • "Cells that fire together, wire together" Donald Hebb
Building Hebbian synapses Unstable
Oja’s rule Feedback,forgetting term or regularizer Erkki Oja • Stabilizes the Hebbian rule. • Leads to a covariance learning rule: the weights converge to the first eigenvector of the covariance matrix. • Similar to power iteration method. A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology,15:267-273 (1982).
Learning outcomes • Visualize and manipulate a relatively large and complex data set. • Perform PCA by building it step by step. • Gain an intuition of the geometry involved in a change of basis and projections. • Start thinking about basic clustering algorithms. • Discuss on dimensionality reduction and other PCA applications
Learning outcomes (cont) • Discuss the assumptions, limitations and shortcomings of applying PCA in different contexts. • Build a model of how PCA might actually take place in neural circuits. • Follow up: eigenfaces, is the brain doing PCA to recognize faces?