Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways

Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways Zhengping Ji Embodied Intelligence Laboratory Computer Science and Engineering Michigan State University, Lansing, USA

Outline • Attention and recognition: Chicken-egg problem • Motivation: brain inspired, neuromorphic, brain’s visual pathway • Saliency-based attention • Where-what Network (WWN): • How to integrate the saliency-based attention & top-down attention control • How attention and recognition helps each other • Conclusions and future work

What is attention?

Bottom-up Attention (Saliency)

Attention Shifting

Spatial Top-down Attention Control

Spatial Top-down Attention Control e.g. pay attention to the center

Object-based Top-down Attention Control

Object-based Top-down Attention Control e.g. pay attention to the square

Chicken-egg Problem • Without attention, recognition cannot do well: • recognition requires attended areas for the further processing. • Without recognition, attention is limited: • not only bottom-up saliency-based cues, but also top-downobject-dependant signals and top-down spatial controls.

Problem

Challenge • High-dimensional space • Background noise • Large variance • Scale • Shape • Illumination • View point • …..

Naïve way: attention window by guessing Win3 Win4 Win2 Win5 Win1 Win6 Saliency-based Attention (I) Boundary Detection Part The mapping from two visual images to correct road boundary type for each sub-window (Reinforcement Learning) IHDR Tree e1 Desired Path e5 e2 e3 e6 e4 Action Generation Part The mapping from road boundary type to correct heading direction (Supervised Learning) IHDR Tree Heading Direction

Low-level image processing Saliency-based Attention (II) Itti & Koch et al. 1998

Review • Attention and recognition: Chicken-egg problem • Motivation: brain inspired, neuromorphic, brain’s visual pathway • Saliency-based attention • Where-what Network (WWN): • How to integrate the saliency-based attention & top-down attention control • How attention and recognition helps each other • Conclusions and future work

Biological Motivations

Challenge: Foreground Teaching • How does a neuron separate a foreground from a complex background? • No need for a teacher to hand-segment the foreground • Fixed foreground, changing background • E.g., during baby object tracking • The background weights are averaged out(no effect during neuronal competition)

Novelty • Bottom-up attention: • Koch & Ullman in 1985, Itti & Koch et al. 1998, Baker et al. 2001, etc. • Position based top-down control: • Olshausen et al. 1993, Tsotsos et al. 1995, Mozer et al. 1996, Schill et al. 2001, Rao et al. 2004, etc. • Object based top-down control: • Deco & Rolls 2004 (no performance evaluation), etc. • Our work: • Saliency is developed features • Both bottom-up and top-down based control • Top-down: either object, position or none • Attention and recognition is a single process

ICDL Architecture (r, c) 40*40 pixel-based Size fixed: 20*20 “where”-motor global 40*40 11*11 11*11 global 21*21 11*11 V1 V2 Image “what”-motor

Multi-level Receptive Fields

Layer Computation • Compute pre-response of cell (i, j) at time t • Sort: z1≥ z2≥ … zk… ≥ zm; • Only top-k neurons respond to keep selectiveness and long-term memory • Response range is normalized • Update the local winners

In-place Learning Rule • Do not use back-prop • Not biologically plausible • Does not give long-term memory • Do not use any distribution model (e.g., Gaussian mixture) • Avoid high complexity of covariance matrix • New Hebbian like rule: • With automatic plasticity scheduling: only winners update • Minimum error toward target in every incremental estimation stage (local first principal component)

Recruit & identify class invariant features Recruit & identify position invariant features Top-down Attention

Experiment Foreground objects defined by “what” motor (20*20) Attended areas defined by “where” motor Randomly Selected background patches (40*40)

Developed Layer 1 Bottom-up synaptic weights of neurons in Layer 1, developed through randomly selected patches from natural images.

Developed Layer 2 Not Intuitive for understanding!! Bottom-up synaptic weights of neurons in Layer 2.

Response Weighted Stimuli for Layer 2

Experimental Result I Recognition rate with incremental learning

Experimental Result II (a) Examples of input images; (b) Responses of attention (“where”) motors when supervised by “what” motors. (c) Responses of attention (“where”) motor when “what” supervision is not available.

Summary • “What” motor helps to direct attention of network to features of particular object; • “Where” motor helps to direct attention to positional information (from 45% to 100% accurate when “where” information is present); • Saliency-based bottom-up attention, location-based top-down attention, and object-based top-down attention are integrated in the top-k spatial competition rule;

Problems • The accuracy for the “where” motors is not good: 45.53% • Layer 1 was developed offline; • More layers are needed to handle more positions • Where motor should be given externally, instead of retina-based representation • No internal iterations especially when the number of hidden layers is larger than one • No cross-level projections

Fully Implemented WWN (Original Design) “where”-motor PP MT (r, c) 25 center Fixed size motor V3 LIP 11*11 11*11 31*31 11*11 21*21 global global Image (40*40) V1 (40*40) V2 (40*40) V4 (40*40) IT (40*40) “what”-motor: 4 objects

Problems • The accuracy for “where” and “what” motors are not good: 25.53% for “what” motor and 4.15% for “where” motor • Too many parameters to be tuned • Training is extremely slow • How to do the internal iterations • “Sweeping” way: always use recently updated weights and responses. • Always use p-1 weights and responses, where p records the current number of iterations. • The response should not be normalized in each lateral inhibition neighborhood.

Modified Simple Architecture Retina-based supervision (r, c) 5 centers Size fixed: 20*20 “where”-motor global 40*40 11*11 11*11 global 21*21 11*11 V1 V2 Image “what”-motor : 5 Objects

Advantage • Internal iterations are not necessary • Network is running much faster • Easier to track neural representations and evaluate performance • Performance evaluation • What motor reaches 100% accuracy for disjoint test • Where motor reaches 41.09% accuracy for disjoint test

Problems Dominance by Top-down Projection Bottom-up responses + Top-down projection from motor Total responses Top-down responses

Solution • Sparse bottom-up responses by only keeping local top-k winner of bottom-up responses • The performance of where motor increases from around 40% to 91%.

Fully Implemented WWN (Latest) “where”-motor MT Each cortex: Modified ADAST (r, c) 3*3 center Fixed-size: 20*20 (40*40) (smoothing by Gaussian) 11*11 11*11 11*11 21*21 Image (40*40) V1 (40*35) V2 (40*40) V4 (40*40) “what”-motor: 5 objects (smoothing by Gaussian)

Modified ADAST L2/3 L4 Previous Cortex L2/3 Next Cortex L5 (ranking) L6 (ranking)

Other improvements • Smooth the external motors using Gaussian function • Where motors are evaluated by regression errors • Local top-k is adaptive by neuron positions • The network does not converge by internal iterations • learning rate for top-down excitation is adaptive by internal iterations. • Using context information

Layer 1 – Bottom-up Weights

Layer 2 – Response-weighted Stimuli

Layer 3 (Where) – Top-down Weights

Layer 3 (What) – Top-down Weights

Test Samples Input “Where” motor (ground truth) “What” motor (ground truth) “What” output (“Where” supervised) “Where” output (Saliency-based) “Where” output (“What” supervised) “What” output (Saliency-based)

Performance Evaluation Average error for “where” and “what” motors (250 test samples)

Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways