1 / 19

Agenda

Agenda. Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based image retrieval Datasets & Conclusions. Databases. Caltech 101 Caltech 256 Pascal Visual Object Classes (VOC) LabelMe

eratliff
Download Presentation

Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agenda • Introduction • Bag-of-words models • Visual words with spatial location • Part-based models • Discriminative methods • Segmentation and recognition • Recognition-based image retrieval • Datasets & Conclusions

  2. Databases • Caltech 101 • Caltech 256 • Pascal Visual Object Classes (VOC) • LabelMe • Slides from Andrew Zisserman

  3. Caltech 101 • Pictures of objects belonging to 101 categories. • About 40 to 800 images per category. Most categories have about 50 images. • The size of each image is roughly 300 x 200 pixels. • Collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato.  • Train on 5, 10, 15, 20 or 30 images • Test on rest – report results per class

  4. Caltech 101 images

  5. Caltech-101: Drawbacks • Smallest category size is 31 images: • Too easy? • left-right aligned • Rotation artifacts • Soon will saturate performance

  6. Caltech-256 • Smallest category size now 80 images • About 30K images • Harder • Not left-right aligned • No artifacts • Performance is halved • More categories • New and larger clutter category

  7. Caltech 256 images baseball-bat dog basketball-hoop kayac traffic light

  8. The PASCAL Visual Object Classes (VOC) Dataset and Challenge Mark EveringhamLuc Van GoolChris WilliamsJohn WinnAndrew Zisserman

  9. The PASCAL VOC Challenge • Challenge in visual objectrecognition funded byPASCAL network ofexcellence • Publicly available dataset ofannotated images. Development kit available. • Main competitions in classification (is there an X in this image) and detection (where are the X’s) • “Taster competitions” in segmentation and 2-D human “pose estimation” (2007-present)

  10. Dataset Content • 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, train, TV • Real images downloaded from flickr, not filtered for “quality” • Complex scenes, scale, pose, lighting, occlusion, ...

  11. OccludedObject is significantly occluded within BB Difficult Not scored in evaluation TruncatedObject extends beyond BB Pose Facing left Annotation • Complete annotation of all objects • Annotated in one session with written guidelines

  12. Examples Aeroplane Bicycle Bird Boat Bottle Bus Car Cat Chair Cow

  13. History • New dataset annotated annually • Annotation of test set is withheld until after challenge

  14. Main Challenge Tasks • Classification • Is there a dog in this image? • Evaluation by precision/recall • Detection • Localize all the people (if any) in this image • Evaluation by precision/recall based on bounding box overlap

  15. Example Precision/Recall: 2007 • Person detection

  16. LabelMe Russell, Torralba, Freman, 2005

  17. Links to datasets The next tables summarize some of the available datasets for training and testing object detection and recognition algorithms. These lists are far from exhaustive. Databases for object localization Databases for object recognition On-line annotation tools Collections

  18. Topics not covered • Context • Scene • Inter-object relations • Video • Tracking & detection • Multiple viewpoints

  19. Summary • Methods reviewed here • Bag of words • Bag of words with location • Parts and structure • Discriminative methods • Combined Segmentation and recognition • Recognition for retrieval • Resources online: http://cs.nyu.edu/~fergus/icml_tutorial • Slides • Code • Links to datasets

More Related