Computational approaches to vision science

Computational approaches to vision science NRS 495 – Neuroscience Seminar Christopher DiMattina, PhD

Marr’s levels of description NRS 495 - Grinnell College - Fall 2012

David Marr • British computational neuroscientist (1945-1980) • Contributions to cognitive science and machine vision NRS 495 - Grinnell College - Fall 2012

Understanding vision • The stated goal of visual neuroscience is to understand how the visual brain works • However it is unclear what is even meant by this • Existing experimental paradigms are inadequate for understanding how it works NRS 495 - Grinnell College - Fall 2012

Neurophysiology • Suppose we found the grandmother cell (WHAT) • Would not tell us HOW response properties are generated from simple neurons • Would not tell us WHY we have such neurons NRS 495 - Grinnell College - Fall 2012

HOW • To get HOW, you need a biologically plausible neural network model – computational neuroscience • Still, this does not tell you WHY NRS 495 - Grinnell College - Fall 2012

WHY • To understand WHY, you need to know what problem the system is trying to solve • Computational theory of visual information processing “A wing would be a most mysterious structure if one did not know that birds flew” - Horace B. Barlow (1961) NRS 495 - Grinnell College - Fall 2012

Machine Vision • To understand the visual system, one must specify the problem and what computations could solve it • If you really know what information processing is going on, you can implement it on a computer NRS 495 - Grinnell College - Fall 2012

Three levels of description • Computational Theory • What is the goal of the computation? What is its logic? • Representation and Algorithm • How can you represent the inputs and outputs? What is the algorithm for the transformation? • Hardware Implementation • How do you implement the representation and algorithm NRS 495 - Grinnell College - Fall 2012

Example • Computation: Addition of numbers for a cash register • Representation: Binary numbers Base 10 numbers Algorithm: The one you learned in grade school • Implementation: Mechanical, computer, etc… NRS 495 - Grinnell College - Fall 2012

Representational framework for vision • Image (intensity) • Primal sketch (edges) • 2 ½ D sketch (surfaces) • 3D model representation NRS 495 - Grinnell College - Fall 2012

Edge detection and the primal sketch • Luminance edges represent changes in intensity • Maxima of first derivative absolute value • Zero-crossings of second-derivative NRS 495 - Grinnell College - Fall 2012

Edges occur on multiple scales • Apply Gaussian blurring to image to select scale NRS 495 - Grinnell College - Fall 2012

Edge detection operator • Apply Gaussian blurring to select scale • Take second derivative (Laplacian) • Find zero-crossings Laplacian of Gaussian operator NRS 495 - Grinnell College - Fall 2012

Linear filtering • Neuron receptive field modeled as a set of numbers indicating spatial arrangement of excitation and inhibition • Predicted response of model neuron given by multiplying this with the image NRS 495 - Grinnell College - Fall 2012

Example operators NRS 495 - Grinnell College - Fall 2012

Laplacian of Gaussian • Resembles retinal ganglion cell receptive fields with center-surround structure NRS 495 - Grinnell College - Fall 2012

Edge detection NRS 495 - Grinnell College - Fall 2012

Operators predictions match RGC cell NRS 495 - Grinnell College - Fall 2012

Marr’s framework • Specify problem to be solved (edge detection) • Develop computational theory - derive center-surround operators similar to retinal ganglion • For pixel image, algorithm is operator convolution • Can implement in a variety of ways (brain, computer, etc…) NRS 495 - Grinnell College - Fall 2012

Barlow and efficient coding NRS 495 - Grinnell College - Fall 2012

Horace Barlow • Neurophysiologist and theorist of vision • Two major ideas (related) • Single neuron doctrine • Efficient coding hypothesis NRS 495 - Grinnell College - Fall 2012

Efficient coding • Barlow hypothesizes that sensory relays recode messages so that redundancy is reduced but little information is lost • For instance, linearly arranged retinal ganglion cells may often fire together when there is an edge • V1 recodes this information in a less redundant manner which preserves the essential information NRS 495 - Grinnell College - Fall 2012

Economy of impulses • Stimuli occurring most often should be coded with a small number of spikes • Stimuli occurring less often should be coded with large number of spikes • Over the distribution of stimuli, this economizes the code NRS 495 - Grinnell College - Fall 2012

Neural activity is metabolically expensive • Recent calculations based on cortical metabolism suggest that at most 1% of cells can be firing strongly at any time (Lennie 2003) NRS 495 - Grinnell College - Fall 2012

Experimental Evidence: Laughlin 1981 • Can maximize information transmission about luminance by making sure all response levels are used with equal frequency • Explains contrast response of fly retinal neurons NRS 495 - Grinnell College - Fall 2012

Sparse coding NRS 495 - Grinnell College - Fall 2012

Two ways to exploit redundancy NRS 495 - Grinnell College - Fall 2012

Compact coding • Assume you have two neurons, each sensitive to one dimension • If data is correlated, there is redundancy in their responses NRS 495 - Grinnell College - Fall 2012

Compact coding • One can represent this data more efficiently with one neuron instead of two NRS 495 - Grinnell College - Fall 2012

Sparse coding • Suppose the data are described by a two-armed distribution (non-Gaussian) • What code would represent the data with the fewest active neurons? NRS 495 - Grinnell College - Fall 2012

Sparse coding NRS 495 - Grinnell College - Fall 2012

Kurtosis • One characteristic of a sparse code is that response distributions over all stimuli have high kurtosis • Cells mostly quiet, but respond strongly to only a few stimuli NRS 495 - Grinnell College - Fall 2012

V1 responses to natural images NRS 495 - Grinnell College - Fall 2012

Critical question • If we learn a linear basis for natural images which maximizes the statistical independence and sparseness of cell responses, what would the basis functions look like? NRS 495 - Grinnell College - Fall 2012

Basis representations of images • An image patch can be represented as a sum of basis patches • Fourier transform represents an image in terms of a sum of sine-wave gratings NRS 495 - Grinnell College - Fall 2012

Optimization problem • Accurately reconstruct natural images • Maximize sparseness of neuron responses NRS 495 - Grinnell College - Fall 2012

The learned filters NRS 495 - Grinnell College - Fall 2012

V1 neurons form a sparse code • V1 filters  Sparse code • Sparse code  V1 filters NRS 495 - Grinnell College - Fall 2012

ICA • Similar results obtained when one learns a transformation between a set of inputs and outputs which maximizes output entropy (Bell & Sejnowski 1997). • Exactly the idea proposed by Barlow NRS 495 - Grinnell College - Fall 2012

Learning hierarchical representations NRS 495 - Grinnell College - Fall 2012

Can we apply similar ideas to learn more complex representations? • Over all images, responses of V1 neurons follow a Laplacian distribution with constant variance λ • For particular regions, there are characteristic patterns of variance and covariance in the activity histograms NRS 495 - Grinnell College - Fall 2012

Variance patterns NRS 495 - Grinnell College - Fall 2012

Complicated statistical dependencies NRS 495 - Grinnell College - Fall 2012

Variance patterns NRS 495 - Grinnell College - Fall 2012

Hierarchical model • Assume patterns of variance are generated by a code of sparse, independent variables v • Learn a set of weights B which describes the commonly occurring patterns I natural images NRS 495 - Grinnell College - Fall 2012

Variance components • Learn sensitivity to different higher-order patterns (textures) NRS 495 - Grinnell College - Fall 2012

Unit activities generalize NRS 495 - Grinnell College - Fall 2012

More recent model • Learned patterns of covariance in activities of V1 cells • Replicates many response properties observed in complex cells and nonlinear neurons in V2 • Outputs segregate textures well NRS 495 - Grinnell College - Fall 2012

Computational approaches to vision science