FP6-004381

S9103-N04 / S9104-N04 FP6-004381 Q-Learning of Sequential Attentionfor Visual Object Recognitionfrom Local Informative Descriptors Lucas Paletta, Gerald Fritz, and Christin Seifert Computational Perception Group, Institute of Digital Image ProcessingJOANNEUM RESEARCH Forschungsgesellschaft mbH, Graz, Austria

IntroductionAttention Patterns in Human Vision • Attention patterns evolve on relevant cues [Neisser 67, Yarbus 67] • Behavioural recognition from sensory and motor action traces in memory [Noton&Stark 71, Rimey&Brown 91] • Eye movement / sacadic scanpaths – sequence of visual features and actions

IntroductionAttentive Recognition and Learning • COMPUTER VISION • Iterative object recognition • Integrating information on local descriptors & geometry • Recognition in a perception-action framework • MACHINE LEARNING • Local descriptors integrated within a decision process • Learning a sequential focus of attention pattern • Learning transsaccadic object recognition • Feedback drives the shaping of attention patterns

[Bandera et al. 1996] [Paletta & Pinz 1998,2000] [Minut & Mahadevan 2001] IntroductionRelated Work • Framework for learning of attentive processes [Bandera et al. 1996, ICML] • Saccadic visual recognition in neural framework [Rybak et al. 1998, Vision Research] • Reinforcement learning in visuomotor decision processes [Fagg et al. 1998, Neural Networks; Coelho & Piater 2000, ICML] • Reinforcement learning for 2D view planning [Paletta & Pinz 2000, Rob. & Aut. Systems] • Image processing for single object MDP [Minut & Mahadevan 2001, IAS]

IntroductionMulti-Stage Recognition Process • EARLY VISION: Filtering of RELEVANT information INFORMATIVE Descriptors and Saliency • FEATURE CODING: ATTENTION WINDOW CONTENT is matched to memory Focus of attention is represented by CODEBOOK vectors • ATTENTION CONTROL: ITERATIVELY adding new information (features, actions) SEQUENTIAL integration of attentive information CONTROL – decision making on attentive actions

IntroductionClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at

Informative FeaturesClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at

Specific case (SIFT) [Lowe, IJCV 04] Informative FeaturesAttentive Visual Object Detection [Fritz et al., AAAI 04] General case Global MAP Image Interest Point Local Descriptors Selection Informative Descriptors Local MAPs Selection Object Models

information content entropy conditional entropy how much information do we gain on average from observation g with respect to class discrimination  Informative FeaturesSelection by Information Content • Does the feature provide useful information? • Information content w.r.t recognition of objects of interest

2.2 Object 1 Object 2 2 Object 3 1.8 1.6  1.4 e2 1.2 1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 e1 Informative FeaturesParzen Window Posterior Estimate • Global posterior estimate too costly • Huge data set • Local estimate preferred • Inside Parzen window • Weighted contributions of class specific training data

1 2 3 Informative FeaturesScale Invariant Feature Transform (SIFT) [Lowe, IJCV 04]

test image SIFT – entropy attended i-SIFT posterior Informative FeaturesInformative Descriptors and Saliency training image

Informative FeaturesBackground Rejection

3 4 8 12 Informative FeaturesInformative Local Descriptors – Video

Feature CodingClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at

1. 3. 2. 4. Feature CodingFocus of Attention from Saliency Maps • Entropy saliency map • Discriminative regions from thresholded saliency • Distance transform for localization of FOA • Inhibition of return for iterative FOA

Feature CodingCodebook Patterns • Investigate pattern within FOA • Matching with prototypes • Unsupervised clustering: K-means (EM, etc.) e.g., k=20

Attention ControlClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at

a3 a4 a2 a5 a1 a6 a8 a7 Attention ControlActions in Sequential Attention

agent action at state st reward Rt environment Attention Control Closed Loop Perception-Action • State s of the RECOGNITION PROCESS • Determined by FEATURE-ACTION sequence (feature g, action a, feature g, …) • Matched trace on feature PROTOTYPES and SHIFT ACTIONS

Attention ControlMarkov Decision Processes (MDP) and Sequential Attention • MDP defined by {S,A,P,R} • P( s(t+1) = s‘ | s(t) = s, a(t) =a ) • R( a, s, s‘ ) = E[ r(t+1) | s(t) = s, a(t) = a, s(t+1) = s‘ ] • Markovian property of environment • s(t) and a(t) determine s(t+1) and r(t+1) • Recognition States: feature-action-feature sequence • RewardR: information gain, entropy loss • H... Posterior entropy associated to state s • Posterior from (state, object) frequency • R := -H = -(H2-H1) • Actions: shift of attention(a1,.., a8) [Puterman 1994]

Attention ControlQ-Learning [Dayan & Watkins 1992] • Q Estimator • Q(s,a) function estimates cumulative expected reward R • Greedy policy efficient • a(t) = arg maxa Q( s(t) , a ) • Learning Rule • Lookup table • Q converges with probability 1 to Q*

Experiments - Overview • Experiment 1 – indoors • COIL-20 image database • Features: Local appearances [Crowley & Verdière 98] • 20 objects, 72 views each • Experiment 2 – outdoors • TSG-20 building image database • Features: SIFT descriptors [Lowe 2004] • 20 objects, 4 views each

Experiments I Reference Image Database COIL-20 • 20 objects, 72 views • 5 FOA per sequence, k=20 prototype patterns • Recognition rate: 94% learned, 90% random strategy

Posterior Entropy Performance – Entropy • AVERAGE REDUCTION of ENTROPY per STEP

Performance – Sequence Length • STEPS PER TRIAL SAVED by LEARNED vs. RANDOM strategy for meeting task goal Hgoal

Tourist Sights Graz • 20 objects, 4 views • 20 train, 20 test images • 320 x 240 pixels Experiments II Reference Image Database TSG-20 http://dib.joanneum.at/cape/TSG-20/

Action Sequences: Sequential Attention

98,8 % 96,0 % steps steps Performance Evaluation

Summary & Conclusion • Transsacadic object recognition • Markov decision process • Real world images • Scalable multi-stage approach to object recognition • Iterative recognition from local features • Uses cues from few, informative local features • Includes geometric information for discrimination • Works comparably well on COIL-20, TSG-20 • Future work: • Hierarchical (PO)MDPs and recognition sub-goals

FP6-004381

FP6-004381

Presentation Transcript

FP6

IST in FP6

Fast detectors FP6

EWICS and FP6

FP6 Project ICING

FP6-511513

FP6 Contractual issues

FP6-511513

EUFAR FP6

The FP6 Process

FP6 WEB-SITE

FP6-511513

FP6 AM

FP6-511513

(FP6-IST-045069)

FP6

FP6 into perspective

“FP6 New instruments”