Latent Variable / Hierarchical Models in Computational Neural Science Ying Nian Wu

Latent Variable / Hierarchical Models in Computational Neural Science Ying Nian Wu UCLA Department of Statistics March 30, 2011

Outline • Latent variable models in statistics • Primary visual cortex (V1) • Modeling and learning in V1 • Layered hierarchical models • Joint work with Song-Chun Zhu and Zhangzhang Si

Latent variable models Hidden Observed Learning: Examples Inference:

Latent variable models Mixture model Factor analysis

Latent variable models Hidden Observed Learning: Examples Maximum likelihood EM/gradient Inference / explaining away E-step / imputation

Computational neural science Hidden Observed Z: Internal representation by neurons Y: Sensory data from outside environment Connection weights Hierarchical extension: modeling Z by another layer of hidden variables explaining Y instead of Z Inference / explaining away

Visual cortex: layered hierarchical architecture bottom-up/top-down V1: primary visual cortex simple cells complex cells Source: Scientific American, 1999

Simple V1 cellsDaugman, 1985 Gabor wavelets: localized sine and cosine waves Transation, rotation, dilation of the above function

V1 simple cells respond to edges image pixels

Complex V1 cells Riesenhuber and Poggio,1999 • Larger receptive field • Less sensitive to deformation V1 complex cells Local max V1 simple cells Local sum Image pixels

Independent Component AnalysisBell and Sejnowski, 1996 Laplacian/Cauchy

Hyvarinen, 2000

Sparse codingOlshausen and Field, 1996 Laplacian/Cauchy/mixture Gaussians

Sparse coding / variable selection Inference: sparsification, non-linear lasso/basis pursuit/matching pursuit mode and uncertainty of p(C|I) explaining-away, lateral inhibition Learning: A dictionary of representational elements (regressors)

Olshausen and Field, 1996

Restricted Boltzmann Machine Hinton, Osindero and Teh, 2006 hidden, binary visible P(H|V): factorized no-explaining away P(V|H)

Energy-based model Teh, Welling, Osindero and Hinton, 2003 Features, no explaining-away Maximum entropy with marginals Exponential family with sufficient stat Markov random field/Gibbs distribution Zhu, Wu, and Mumford, 1997 Wu, Liu, and Zhu, 2000

Zhu, Wu, and Mumford, 1997 Wu, Liu, and Zhu, 2000

Visual cortex: layered hierarchical architecture bottom-up/top-down What is beyond V1? Hierarchical model? Source: Scientific American, 1999

Hierchical ICA/Energy-based model? Larger features Must introduce nonlinearities Purely bottom-up

Hierarchical RBM Hinton, Osindero and Teh, 2006 V’ Unfolding, untying, re-learning H I V P(H)  P(V’,H) P(V,H) = P(H)P(V|H) Discriminative correction by back-propagation

Hierarchical sparse coding Attributed sparse coding elements transformation group topological neighborhood system Layer above : further coding of the attributes of selected sparse coding elements

Active basis model Wu, Si, Gong, Zhu, 10 Zhu, Guo, Wang, Xu, 05 n-stroke template n = 40 to 60, box= 100x100

Active basis model Wu, Si, Gong, Zhu, 10 Zhu, et al., 05 n-stroke template n = 40 to 60, box= 100x100 Yuille, Hallinan, Cohen, 92

Simplicity • Simplest AND-OR graph (Pearl, 84; Zhu, Mumford 06) • AND composition and OR perturbations or variations of basis elements • Simplest shape model: average + residual • Simplest modification of Olshausen-Field model • Further sparse coding of attributes of sparse coding elements

Bottom layer: sketch against texture p(C, U) = p(C) p(U|C) = p(C) q(U|C) = p(C) q(U,C)/q(C) Maximum entropy (Della Pietra, Della Pietra, Lafferty, 97; Zhu, Wu, Mumford, 97; Jin, S. Geman, 06; Wu, Guo, Zhu, 08) Special case: density substitution (Friedman, 87; Jin, S. Geman, 06) • Only need to pool a marginal q(c) as null hypothesis • natural images explicit q(I) of Zhu, Mumford, 97 • this image explicit q(I) of Zhu, Wu, Mumford, 97

Shared sketch algorithm: maximum likelihood learning Finding n strokes to sketch M images simultaneously n = 60, M = 9 Prototype: shared matching pursuit (closed-form computation) Step 1: two max to explain images by maximum likelihood no early decision on edge detection Step 2: arg-max for inferring hidden variables Step 3: arg-max explains away, thus inhibits (matching pursuit, Mallat, Zhang, 93)

Cortex-like sum-max maps: maximum likelihood inference Bottom-up sum-max scoring (no early edge decision) Top-down arg-max sketching Scan over multiple resolutions SUM1 layer: simple V1 cells of Olshausen, Field, 96 MAX1 layer: complex V1 cells of Riesenhuber, Poggio, 99 • Reinterpreting MAX1: OR-node of AND-OR, MAX for ARG-MAX in max-product algorithm • Stick to Olshausen-Field sparse top-down model : AND-node of AND-OR • Active basis, SUM2 layer, “neurons” memorize shapes by sparse connections to MAX1 layer • Hierarchical, recursive AND-OR/ SUM-MAX Architecture: more top-down than bottom-up Neurons: more representational than operational (OR-neurons/AND-neurons)

Bottom-up scoring and top-down sketching SUM2 MAX1 arg MAX1 SUM1 Bottom-up detection Top-down sketching Sparse selective connection as a result of learning Explaining-away in learning but not in inference

Scan over multiple resolutions and orientations (rotating template)

Classification based on log likelihood ratio score Freund, Schapire, 95; Viola, Jones, 04

Adjusting Active Basis Model by L2 Regularized Logistic Regression By Ruixun Zhang L2 regularized logistic regression re-estimated lambda’s Conditional on: (1) selected basis elements (2) inferred hidden variables (1) and (2)  generative learning • Exponential family model, q(I) negatives  Logistic regression for p(class | image), partial likelihood • Generative learning without negative examples basis elements and hidden variables • Discriminative adjustment with hugely reduced dimensionality correcting conditional independence assumption

Arg-max inference and explaining away, no reweighting, • Residual images neutralize existing elements, same set of training examples Active basis templates • No arg-max inference or explaining away inhibition • Reweighted examples neutralize existing classifiers, changing set of examples Adaboost templates same # elements double # elements # of negatives: 10556 7510 4552 1493 12217

Mixture model of active basis templates fitted by EM/maximum likelihood with random initialization MNIST 500 total

Learning active basis models from non-aligned image EM-type maximum likelihood learning, Initialized by single image learning

Learning active basis models from non-aligned image

Hierarchical active basis by Zhangzhang Si et al. • And-OR graph: Pearl, 84; Zhu, Mumford, 06 • Compositionality and reusability: Geman, Potter, Chi, 02; L.Zhu, Lin, Huang, Chen,Yuille, 08 • Part-based method: everyone et al. • Latent SVM: Felzenszwalb, McAllester, Ramanan, 08 • Constellation model: Weber, Welling, Perona, 00 High log-like Low log-likelihood

Simplicity • Simplest and purest recursive two-layer AND-OR graph • Simplest generalization of active basis model

AND-OR graph and SUM-MAX maps maximum likelihood inference • Cortex-like, related to Riesenhuber, Poggio, 99 • Bottom-up sum-max scoring • Top-down arg-max sketching

Hierarchical active basis by Zhangzhang Si et al.

Shape script by composing active basis shape motifs Representing elementary geometric shapes (shape motifs) by active bases (Si, Wu, 10) Geometry = sketch that can be parametrized

Summary Bottom-layer: Olshausen-Field (foreground) + Zhu-Wu-Mumford (background) Maximum entropy tilting (Della Pietra, Della Pietra, Lafferty, 97) white noise texture (high entropy) sketch (low and mid entropy) (reverse the central limit theorem effect of information scaling) Build up layers: (1) AND-OR, SUM-MAX (top-down arg-MAX) (2) Perpetual sparse coding: further coding of attributes of the current sparse coding elements (a) residuals of attributes  continuous OR-nodes (b) mixture model  discrete OR-nodes

Latent Variable / Hierarchical Models in Computational Neural Science Ying Nian Wu

Latent Variable / Hierarchical Models in Computational Neural Science Ying Nian Wu

Presentation Transcript

Chapter 13 Alternative Models of Systematic Risk

Christopher M. Bishop

Part III Hierarchical Bayesian Models

Latent Heat

Artificial Intelligence Chapter 20.5: Neural Networks

LATENT IMAGE FORMATION

Dynamic computational networks

Switched LAN Architecture

Computational Models for Social Networks — Social Influence and S tructural H ole

Computational Complexity:

Earth Science Chapter 18.1 – Water in the Atmosphere

Speed of a wave

Identification and Neural Networks

Mathematical Models of Short-Term Synaptic plasticity

Artificial Neural Network (ANN)

Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University

Computational Social Choice

Biology EOC Review

What can computational models tell us about face processing?

Switched LAN Architecture

Computational Electromagnetics

Chapter 10