1 / 30

FP6-004381

S9103-N04 / S9104-N04. FP6-004381. Q-Learning of Sequential Attention for Visual Object Recognition from Local Informative Descriptors.

shika
Download Presentation

FP6-004381

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. S9103-N04 / S9104-N04 FP6-004381 Q-Learning of Sequential Attentionfor Visual Object Recognitionfrom Local Informative Descriptors Lucas Paletta, Gerald Fritz, and Christin Seifert Computational Perception Group, Institute of Digital Image ProcessingJOANNEUM RESEARCH Forschungsgesellschaft mbH, Graz, Austria

  2. IntroductionAttention Patterns in Human Vision • Attention patterns evolve on relevant cues [Neisser 67, Yarbus 67] • Behavioural recognition from sensory and motor action traces in memory [Noton&Stark 71, Rimey&Brown 91] • Eye movement / sacadic scanpaths – sequence of visual features and actions

  3. IntroductionAttentive Recognition and Learning • COMPUTER VISION • Iterative object recognition • Integrating information on local descriptors & geometry • Recognition in a perception-action framework • MACHINE LEARNING • Local descriptors integrated within a decision process • Learning a sequential focus of attention pattern • Learning transsaccadic object recognition • Feedback drives the shaping of attention patterns

  4. [Bandera et al. 1996] [Paletta & Pinz 1998,2000] [Minut & Mahadevan 2001] IntroductionRelated Work • Framework for learning of attentive processes [Bandera et al. 1996, ICML] • Saccadic visual recognition in neural framework [Rybak et al. 1998, Vision Research] • Reinforcement learning in visuomotor decision processes [Fagg et al. 1998, Neural Networks; Coelho & Piater 2000, ICML] • Reinforcement learning for 2D view planning [Paletta & Pinz 2000, Rob. & Aut. Systems] • Image processing for single object MDP [Minut & Mahadevan 2001, IAS]

  5. IntroductionMulti-Stage Recognition Process • EARLY VISION: Filtering of RELEVANT information INFORMATIVE Descriptors and Saliency • FEATURE CODING: ATTENTION WINDOW CONTENT is matched to memory Focus of attention is represented by CODEBOOK vectors • ATTENTION CONTROL: ITERATIVELY adding new information (features, actions) SEQUENTIAL integration of attentive information CONTROL – decision making on attentive actions

  6. IntroductionClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at

  7. Informative FeaturesClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at

  8. Specific case (SIFT) [Lowe, IJCV 04] Informative FeaturesAttentive Visual Object Detection [Fritz et al., AAAI 04] General case Global MAP Image Interest Point Local Descriptors Selection Informative Descriptors Local MAPs Selection Object Models

  9. information content entropy conditional entropy how much information do we gain on average from observation g with respect to class discrimination  Informative FeaturesSelection by Information Content • Does the feature provide useful information? • Information content w.r.t recognition of objects of interest

  10. 2.2 Object 1 Object 2 2 Object 3 1.8 1.6  1.4 e2 1.2 1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 e1 Informative FeaturesParzen Window Posterior Estimate • Global posterior estimate too costly • Huge data set • Local estimate preferred • Inside Parzen window • Weighted contributions of class specific training data

  11. 1 2 3 Informative FeaturesScale Invariant Feature Transform (SIFT) [Lowe, IJCV 04]

  12. test image SIFT – entropy attended i-SIFT posterior Informative FeaturesInformative Descriptors and Saliency training image

  13. Informative FeaturesBackground Rejection

  14. 3 4 8 12 Informative FeaturesInformative Local Descriptors – Video

  15. Feature CodingClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at

  16. 1. 3. 2. 4. Feature CodingFocus of Attention from Saliency Maps • Entropy saliency map • Discriminative regions from thresholded saliency • Distance transform for localization of FOA • Inhibition of return for iterative FOA

  17. Feature CodingCodebook Patterns • Investigate pattern within FOA • Matching with prototypes • Unsupervised clustering: K-means (EM, etc.) e.g., k=20

  18. Attention ControlClosed-Loop Recognition Process INFORMATIVE FEATURES FEATURE CODING ATTENTION CONTROL Early Vision Focus of Interest (FOI) Codebook Vectors State Representation st Posterior Entropy Rt MDP Decision Maker at

  19. a3 a4 a2 a5 a1 a6 a8 a7 Attention ControlActions in Sequential Attention

  20. agent action at state st reward Rt environment Attention Control Closed Loop Perception-Action • State s of the RECOGNITION PROCESS • Determined by FEATURE-ACTION sequence (feature g, action a, feature g, …) • Matched trace on feature PROTOTYPES and SHIFT ACTIONS

  21. Attention ControlMarkov Decision Processes (MDP) and Sequential Attention • MDP defined by {S,A,P,R} • P( s(t+1) = s‘ | s(t) = s, a(t) =a ) • R( a, s, s‘ ) = E[ r(t+1) | s(t) = s, a(t) = a, s(t+1) = s‘ ] • Markovian property of environment • s(t) and a(t) determine s(t+1) and r(t+1) • Recognition States: feature-action-feature sequence • RewardR: information gain, entropy loss • H... Posterior entropy associated to state s • Posterior from (state, object) frequency • R := -H = -(H2-H1) • Actions: shift of attention(a1,.., a8) [Puterman 1994]

  22. Attention ControlQ-Learning [Dayan & Watkins 1992] • Q Estimator • Q(s,a) function estimates cumulative expected reward R • Greedy policy efficient • a(t) = arg maxa Q( s(t) , a ) • Learning Rule • Lookup table • Q converges with probability 1 to Q*

  23. Experiments - Overview • Experiment 1 – indoors • COIL-20 image database • Features: Local appearances [Crowley & Verdière 98] • 20 objects, 72 views each • Experiment 2 – outdoors • TSG-20 building image database • Features: SIFT descriptors [Lowe 2004] • 20 objects, 4 views each

  24. Experiments I Reference Image Database COIL-20 • 20 objects, 72 views • 5 FOA per sequence, k=20 prototype patterns • Recognition rate: 94% learned, 90% random strategy

  25. Posterior Entropy Performance – Entropy • AVERAGE REDUCTION of ENTROPY per STEP

  26. Performance – Sequence Length • STEPS PER TRIAL SAVED by LEARNED vs. RANDOM strategy for meeting task goal Hgoal

  27. Tourist Sights Graz • 20 objects, 4 views • 20 train, 20 test images • 320 x 240 pixels Experiments II Reference Image Database TSG-20 http://dib.joanneum.at/cape/TSG-20/

  28. Action Sequences: Sequential Attention

  29. 98,8 % 96,0 % steps steps Performance Evaluation

  30. Summary & Conclusion • Transsacadic object recognition • Markov decision process • Real world images • Scalable multi-stage approach to object recognition • Iterative recognition from local features • Uses cues from few, informative local features • Includes geometric information for discrimination • Works comparably well on COIL-20, TSG-20 • Future work: • Hierarchical (PO)MDPs and recognition sub-goals

More Related