1 / 28

Unbiased Look at Dataset Bias

Unbiased Look at Dataset Bias. Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011. Outline. 1. Introduction 2. Measuring Dataset Bias 3 . Measuring Dataset’s Value 4 . Discussion. Name That Dataset!.

jared
Download Presentation

Unbiased Look at Dataset Bias

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unbiased Look at Dataset Bias Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011

  2. Outline • 1. Introduction • 2. Measuring Dataset Bias • 3. Measuring Dataset’s Value • 4. Discussion

  3. Name That Dataset! • Let’s play a game!

  4. Answer

  5. UIUC test set is not the same as its training set COILis a lab-baseddataset Caltech101and Caltech256 are predictably confused with each other

  6. Caltech 101 Caltech256 • Pictures of objects belonging to 101 categories. About 40 to 800 images per category • Most categories have about 50images • Collected in September 2003 • The size of each image is roughly 300 x 200 pixels

  7. LabelMe • A project created by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) • Adataset of digital images with annotations • The most applicable use of LabelMe is in computer vision research • As of October 31, 2010, LabelMe has 187,240 images, 62,197 annotated images, and 658,992 labeled objects

  8. Bias • Urban scenes Rural landscapes • Professional photographs Amateur snapshots • Entire scenes Single objects

  9. The Rise of the Modern Dataset • COIL-100 dataset (a hundred household objects on a black background) • Coreland 15 Sceneswere Professional collections visual complexity • Caltech-101(101 objects using Google and cleaned by hand) wilderness of the Internet • MSRCand LabelMe(both researcher-collected sets), complex scenes with many objects

  10. The Rise of the Modern Dataset • PASCALVisual Object Classes (VOC) was a reaction against the lax training and testing standards of previous datasets • The batch of very-large-scale, Internet-mined datasets – Tiny Images , ImageNet, and SUN09– can be considered a reaction against the inadequacies of training and testing on datasets that are just too small for the complexity of the real world

  11. Outline • 2. Measuring Dataset Bias -2.1. Cross-dataset generalization -2.2. Negative Set Bias

  12. Cross-dataset generalization

  13. Negative Set Bias • Evaluate the relative bias in the negative sets of different datasets (e.g. is a “not car” in PASCAL different from “not car” in MSRC?). • For each dataset, we train a classifier on its own set of positive and negative instances. Then, during testing, the positives come from that dataset, but the negativescome from all datasets combined

  14. Outline • 3. Measuring Dataset’s Value

  15. Measuring Dataset’s Value • Given a particular detection task and benchmark, there are two basic ways of improving the performance • The first solution is to improve the features, the object representation and the learning algorithm for the detector • The second solution is to simply enlarge the amount of data available for training

  16. Market Value for a carsample across datasets

  17. Outline • 4. Discussion

  18. Discussion • Caltech-101is extremely biased with virtually no observed generalization, and should have been retired long ago (as arguedby[14] back in 2006) • MSRChas also fared very poorly. • PASCAL VOC, ImageNet and SUN09, have fared comparatively well

More Related