300 likes | 435 Views
S9103-N04 / S9104-N04. FP6-004381. Q-Learning of Sequential Attention for Visual Object Recognition from Local Informative Descriptors.
E N D
S9103-N04 / S9104-N04 FP6-004381 Q-Learning of Sequential Attentionfor Visual Object Recognitionfrom Local Informative Descriptors Lucas Paletta, Gerald Fritz, and Christin Seifert Computational Perception Group, Institute of Digital Image ProcessingJOANNEUM RESEARCH Forschungsgesellschaft mbH, Graz, Austria
IntroductionAttention Patterns in Human Vision • Attention patterns evolve on relevant cues [Neisser 67, Yarbus 67] • Behavioural recognition from sensory and motor action traces in memory [Noton&Stark 71, Rimey&Brown 91] • Eye movement / sacadic scanpaths – sequence of visual features and actions
IntroductionAttentive Recognition and Learning • COMPUTER VISION • Iterative object recognition • Integrating information on local descriptors & geometry • Recognition in a perception-action framework • MACHINE LEARNING • Local descriptors integrated within a decision process • Learning a sequential focus of attention pattern • Learning transsaccadic object recognition • Feedback drives the shaping of attention patterns
[Bandera et al. 1996] [Paletta & Pinz 1998,2000] [Minut & Mahadevan 2001] IntroductionRelated Work • Framework for learning of attentive processes [Bandera et al. 1996, ICML] • Saccadic visual recognition in neural framework [Rybak et al. 1998, Vision Research] • Reinforcement learning in visuomotor decision processes [Fagg et al. 1998, Neural Networks; Coelho & Piater 2000, ICML] • Reinforcement learning for 2D view planning [Paletta & Pinz 2000, Rob. & Aut. Systems] • Image processing for single object MDP [Minut & Mahadevan 2001, IAS]
IntroductionMulti-Stage Recognition Process • EARLY VISION: Filtering of RELEVANT information INFORMATIVE Descriptors and Saliency • FEATURE CODING: ATTENTION WINDOW CONTENT is matched to memory Focus of attention is represented by CODEBOOK vectors • ATTENTION CONTROL: ITERATIVELY adding new information (features, actions) SEQUENTIAL integration of attentive information CONTROL – decision making on attentive actions
IntroductionClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at
Informative FeaturesClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at
Specific case (SIFT) [Lowe, IJCV 04] Informative FeaturesAttentive Visual Object Detection [Fritz et al., AAAI 04] General case Global MAP Image Interest Point Local Descriptors Selection Informative Descriptors Local MAPs Selection Object Models
information content entropy conditional entropy how much information do we gain on average from observation g with respect to class discrimination Informative FeaturesSelection by Information Content • Does the feature provide useful information? • Information content w.r.t recognition of objects of interest
2.2 Object 1 Object 2 2 Object 3 1.8 1.6 1.4 e2 1.2 1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 e1 Informative FeaturesParzen Window Posterior Estimate • Global posterior estimate too costly • Huge data set • Local estimate preferred • Inside Parzen window • Weighted contributions of class specific training data
1 2 3 Informative FeaturesScale Invariant Feature Transform (SIFT) [Lowe, IJCV 04]
test image SIFT – entropy attended i-SIFT posterior Informative FeaturesInformative Descriptors and Saliency training image
3 4 8 12 Informative FeaturesInformative Local Descriptors – Video
Feature CodingClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at
1. 3. 2. 4. Feature CodingFocus of Attention from Saliency Maps • Entropy saliency map • Discriminative regions from thresholded saliency • Distance transform for localization of FOA • Inhibition of return for iterative FOA
Feature CodingCodebook Patterns • Investigate pattern within FOA • Matching with prototypes • Unsupervised clustering: K-means (EM, etc.) e.g., k=20
Attention ControlClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at
a3 a4 a2 a5 a1 a6 a8 a7 Attention ControlActions in Sequential Attention
agent action at state st reward Rt environment Attention Control Closed Loop Perception-Action • State s of the RECOGNITION PROCESS • Determined by FEATURE-ACTION sequence (feature g, action a, feature g, …) • Matched trace on feature PROTOTYPES and SHIFT ACTIONS
Attention ControlMarkov Decision Processes (MDP) and Sequential Attention • MDP defined by {S,A,P,R} • P( s(t+1) = s‘ | s(t) = s, a(t) =a ) • R( a, s, s‘ ) = E[ r(t+1) | s(t) = s, a(t) = a, s(t+1) = s‘ ] • Markovian property of environment • s(t) and a(t) determine s(t+1) and r(t+1) • Recognition States: feature-action-feature sequence • RewardR: information gain, entropy loss • H... Posterior entropy associated to state s • Posterior from (state, object) frequency • R := -H = -(H2-H1) • Actions: shift of attention(a1,.., a8) [Puterman 1994]
Attention ControlQ-Learning [Dayan & Watkins 1992] • Q Estimator • Q(s,a) function estimates cumulative expected reward R • Greedy policy efficient • a(t) = arg maxa Q( s(t) , a ) • Learning Rule • Lookup table • Q converges with probability 1 to Q*
Experiments - Overview • Experiment 1 – indoors • COIL-20 image database • Features: Local appearances [Crowley & Verdière 98] • 20 objects, 72 views each • Experiment 2 – outdoors • TSG-20 building image database • Features: SIFT descriptors [Lowe 2004] • 20 objects, 4 views each
Experiments I Reference Image Database COIL-20 • 20 objects, 72 views • 5 FOA per sequence, k=20 prototype patterns • Recognition rate: 94% learned, 90% random strategy
Posterior Entropy Performance – Entropy • AVERAGE REDUCTION of ENTROPY per STEP
Performance – Sequence Length • STEPS PER TRIAL SAVED by LEARNED vs. RANDOM strategy for meeting task goal Hgoal
Tourist Sights Graz • 20 objects, 4 views • 20 train, 20 test images • 320 x 240 pixels Experiments II Reference Image Database TSG-20 http://dib.joanneum.at/cape/TSG-20/
98,8 % 96,0 % steps steps Performance Evaluation
Summary & Conclusion • Transsacadic object recognition • Markov decision process • Real world images • Scalable multi-stage approach to object recognition • Iterative recognition from local features • Uses cues from few, informative local features • Includes geometric information for discrimination • Works comparably well on COIL-20, TSG-20 • Future work: • Hierarchical (PO)MDPs and recognition sub-goals