190 likes | 214 Views
Agenda. Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based image retrieval Datasets & Conclusions. Databases. Caltech 101 Caltech 256 Pascal Visual Object Classes (VOC) LabelMe
E N D
Agenda • Introduction • Bag-of-words models • Visual words with spatial location • Part-based models • Discriminative methods • Segmentation and recognition • Recognition-based image retrieval • Datasets & Conclusions
Databases • Caltech 101 • Caltech 256 • Pascal Visual Object Classes (VOC) • LabelMe • Slides from Andrew Zisserman
Caltech 101 • Pictures of objects belonging to 101 categories. • About 40 to 800 images per category. Most categories have about 50 images. • The size of each image is roughly 300 x 200 pixels. • Collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato. • Train on 5, 10, 15, 20 or 30 images • Test on rest – report results per class
Caltech-101: Drawbacks • Smallest category size is 31 images: • Too easy? • left-right aligned • Rotation artifacts • Soon will saturate performance
Caltech-256 • Smallest category size now 80 images • About 30K images • Harder • Not left-right aligned • No artifacts • Performance is halved • More categories • New and larger clutter category
Caltech 256 images baseball-bat dog basketball-hoop kayac traffic light
The PASCAL Visual Object Classes (VOC) Dataset and Challenge Mark EveringhamLuc Van GoolChris WilliamsJohn WinnAndrew Zisserman
The PASCAL VOC Challenge • Challenge in visual objectrecognition funded byPASCAL network ofexcellence • Publicly available dataset ofannotated images. Development kit available. • Main competitions in classification (is there an X in this image) and detection (where are the X’s) • “Taster competitions” in segmentation and 2-D human “pose estimation” (2007-present)
Dataset Content • 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, train, TV • Real images downloaded from flickr, not filtered for “quality” • Complex scenes, scale, pose, lighting, occlusion, ...
OccludedObject is significantly occluded within BB Difficult Not scored in evaluation TruncatedObject extends beyond BB Pose Facing left Annotation • Complete annotation of all objects • Annotated in one session with written guidelines
Examples Aeroplane Bicycle Bird Boat Bottle Bus Car Cat Chair Cow
History • New dataset annotated annually • Annotation of test set is withheld until after challenge
Main Challenge Tasks • Classification • Is there a dog in this image? • Evaluation by precision/recall • Detection • Localize all the people (if any) in this image • Evaluation by precision/recall based on bounding box overlap
Example Precision/Recall: 2007 • Person detection
LabelMe Russell, Torralba, Freman, 2005
Links to datasets The next tables summarize some of the available datasets for training and testing object detection and recognition algorithms. These lists are far from exhaustive. Databases for object localization Databases for object recognition On-line annotation tools Collections
Topics not covered • Context • Scene • Inter-object relations • Video • Tracking & detection • Multiple viewpoints
Summary • Methods reviewed here • Bag of words • Bag of words with location • Parts and structure • Discriminative methods • Combined Segmentation and recognition • Recognition for retrieval • Resources online: http://cs.nyu.edu/~fergus/icml_tutorial • Slides • Code • Links to datasets