1 / 32

Recognition Using Visual Phrases

Recognition Using Visual Phrases. CVPR 2011 Best Student Paper. Outline. Introduction Related Works Approach Phrasal Recognition Decoding Multiple Detections Results Discussion. Introduction. Introduction. Visual Phrases Traditional approach Detect objects (person , dog, horse …)

zarifa
Download Presentation

Recognition Using Visual Phrases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognition Using Visual Phrases CVPR 2011 Best Student Paper

  2. Outline • Introduction • Related Works • Approach • Phrasal Recognition • Decoding Multiple Detections • Results • Discussion

  3. Introduction

  4. Introduction • Visual Phrases • Traditional approach • Detect objects (person, dog, horse…) • Relation between objects • NMS(non-maximum suppression) • PASCAL • other • Disadvantage

  5. Introduction

  6. Introduction • Contributions • Introducing visual phrases as categories for recognition • Introducing a novel dataset for phrasal recognition • The state of the art methods of modeling interactions • A decoding algorithm • Performance results in multi-class object recognition

  7. Related Work • Object Recognition • Object Recognition • Deformable templates [IEEE2001,CVPR1998] • Part base model [CVPR2005,CVPR2003] • Detectors • Deformable based model [IEEE2010]

  8. Related Work • Object Interactions • Focus on relation [ECCV2008] • Person with object [CVPR 2010] • Objects [ECCV2010] • Relation of objects [ICCV2010] • left, right, top, down • label • weight, confidence

  9. Related Work • Scene understanding • Represent scenes as with global features that take into account general information about images [Vision2001,CVPR2006] • Cluster[ECCV2008]

  10. Related Work • Machine translation • Statistical translation methods[Press2010] • Translation model • Language model • A decoding algorithm • Output: aquery sentence • Allow multiple to multiple translation

  11. Phrasal Recognition • Phrasal Recognition Dataset • select 8 obj. class (Pascal VOC 2008) • person, bike, car, dog, horse, bottle, sofa, chair • A list of 17 visual phrases + background class • Dog jumping ,horse jumping, person riding horse…

  12. Phrasal Recognition

  13. Phrasal Recognition • Datasets • 2769 images (822 negative image) • 120 examples, average of each classes • 5067 bounding boxes(1796 phrases,3271 objects) • The complexity of Visual Phrases crease • The number of training example decrease

  14. Phrasal Recognition • Appearance models • Deformation part model • 17 phrases in our dataset using provided bounding boxes • 8 categories from Pascal are used as models for objects

  15. Decoding Multiple Detections • NMS decoding • Perfect detectors with excellent tightly tuned models • Natural decoding strategy better than NMS on interaction • Greedily search the space of labels • Well designed feature (nearby) All detector responses Final outcome Decoding

  16. Decoding Multiple Detections • Decoding process • We compare our decoding algorithm with that of [2] on our phrase dataset • Step1: construct the feature • Step2: running algorithm to learn a set of weights that rescore the confidences of the bounding boxes based on interactions • Step3: We again rescore until optimal

  17. Discriminative models for multi-class object layout

  18. Decoding Multiple Detections • : a bounding box in an image • An image is represented as a collection of overlapping Bounding boxes • X = { : i=1….M},M is the total num of bounding box • K is different categories • 1 , • 1 • 1 is the score of image X with Y • is the set of weights that corresponds to • the class of the bounding box

  19. Decoding Multiple Detections • Representation • Image = bounding boxes • Confidence • Overlap • Size ratio • Relation • Above, Below, overlapping • Window, category, spatial bins • Representation has K*3*3+1 dimensions

  20. Decoding Multiple Detections • Inference • assume bounding boxes are independent given their features • 1

  21. Decoding Multiple Detections • Learning • A form of max margin structure learning • 1

  22. Decoding Multiple Detections • 1 • our inner maximization is exact and very fast. We solve this optimization problem by subgradient descent method as follows.

  23. Result • Single category detection • deformable part models for 17 visual phrase • the trained models from for objects • Use PASCAL dataset : 50 positive and 150 negative examples • Show Precision-Recall (PR) curves • Trained these detectors with at most 50 positive examples

  24. Result

  25. Result

  26. Result

  27. Result

  28. Result • Decoding [2] C. F. C. Desai, D. Ramanan. Discriminative models for multi-class object layout. In ICCV, 2010.

  29. Result

  30. Result

  31. Discussion • Introduce visual phrases, phrasal recognition dataset • A coding algorithm • The dimensionality of our features grows with the number of categories • Future Work • the relations between attributes and objects • parts and objects • visual phrases and scenes • objects and visual phrases mirror one another

  32. Discussion • Experience • Low complexity • Use less data to detection • Features grows with the number of categories (exponential 2n) • But we don’t need to consider all of the categories when we model the interactions • Building long enough phrase tables is still a challenge

More Related