1 / 18

Landmark Classification in Large-scale Image Collections

Landmark Classification in Large-scale Image Collections. Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009. Outline. Introduction Building Internet-Scale Datasets Image Classification Experiments Conclusion. Introduction. Goal

ruana
Download Presentation

Landmark Classification in Large-scale Image Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Landmark Classification in Large-scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009

  2. Outline • Introduction • Building Internet-Scale Datasets • Image Classification • Experiments • Conclusion

  3. Introduction • Goal • Image classification on much larger datasets featuring millions of images and hundreds of categories • Image classification • Multiclass SVM • Flickr • landmark • Geotagged photos • Text tag

  4. Introduction

  5. Building Internet-Scale Datasets • Long-term goal • to create large labeled datasets • To retrieve Flickr60 million geotaggedphotos • x, y coordinates • Eliminate photos • (worse than about a city block) -> 30 million photos • Mean shift cluster • radius of the disc is about 100m[3] • Peaks in the photo density distribution[5] • at most 5 photos from any given Flickr user towards any given peak • Top 500 peaks as categories • 500th peak has 585 photos • 1000th peak has 284 photos • Final Dataset 1.9 million photos

  6. Top 5 categories

  7. Image Feature(visual) Visualword Clustering SIFT descriptors from photos in the training set k-means Approximate nearest neighbor(ANN)[1] Form a frequency vector which counts the number of occurrences of each visual word in the image Normalize L2-norm of 1

  8. Image Feature(text tag) At least 3 different users Binary vector indicate presence or absence Normalize L2-norm of 1

  9. Image Feature(Combination) Normalize L2-norm of 1

  10. Image Classification • Find which class has the highest score • m is the number of classes • x is the feature vector of an image • is the weighting model • is the score for class y under w • It’s by nature a multiway(as opposed to binary) classification problem

  11. Image Classification • Multiclass SVM[4] to learn model w • Using the SVM software package[9] • A set of training examples • Multiclass SVM optimize the objective function

  12. Experiments(1/6) • Dataset 2 million images • Each of these experiments evenly divided the dataset into test and training image sets • The number of images used in an m-way classification experiment, the baseline probability of a correct random guess is 1/m.

  13. Experiments(2/6)

  14. Experiments(3/6)

  15. Experiments(4/6) • 20 well-traveled people to each label 50 photos taken at the world’s top ten landmarks. • Textual tags were also shown for a random subset of the photos. • the average human classification accuracy was 68.0% without textual tags and 76.4% when both the image and tags were shown • Thus the humans performed better than the automatic classifier when using visual features alone (68.0% versus 57.55%) but about the same when both text and visual features were available (76.4% versus 80.91%).

  16. Experiments(5/6) • Visual vocabulary K 20% 50%

  17. Experiments(6/6) • Image classification on a single 2.66 GHz cpu • total time 2.4s • most of which is consumed by SIFT interest point detection • If SIFT features are extracted, classification requires only • 3.06 ms for 200 categories • 0.15 ms for 20 categories

  18. Conclusion • Creating large labeled image datasets from geotagged image collections, which nearly 2 million are labeled. • Demonstrate multiclass SVM classifiers using SIFT-based bag-of-word featuresachieve quite good classification rates for largescale problems, with accuracy that in some cases is comparable to that of humans on the same task. • With text features from tagging, the accuracy can be hundreds of times the baseline.

More Related