1 / 72

Tamara Berg Object Recognition – BoF models

790-133 Recognizing People, Objects, & Actions. Tamara Berg Object Recognition – BoF models. Topic Presentations. Hopefully you have met your topic presentations group members?

azize
Download Presentation

Tamara Berg Object Recognition – BoF models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 790-133 Recognizing People, Objects, & Actions Tamara Berg Object Recognition – BoF models

  2. Topic Presentations • Hopefully you have met your topic presentations group members? • Group 1 – see me to run through slides this week or Monday at the latest (I’m traveling Thurs/Friday). Send me links to 2-3 papers for the class to read. • Sign up for class google group (790-133). To find the group go to groups.google.com and search for 790-133 (sorted by date). Use this to post/answer questions related to the class.

  3. Object Bag-of-features models Bag of ‘features’ source: Svetlana Lazebnik

  4. Exchangeability • De Finetti Theorem of exchangeability (bag of words theorem): the joint probability distribution underlying the data is invariant to permutation.

  5. US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) source: Svetlana Lazebnik

  6. Bag of words for text • Represent documents as a “bags of words”

  7. Example • Doc1 = “the quick brown fox jumped” • Doc2 = “brown quick jumped fox the” Would a bag of words model represent these two documents differently?

  8. Bag of words for images • Representimagesas a “bag offeatures”

  9. Bag of features: outline • Extract features source: Svetlana Lazebnik

  10. Bag of features: outline • Extract features • Learn “visual vocabulary” source: Svetlana Lazebnik

  11. Bag of features: outline • Extract features • Learn “visual vocabulary” • Represent images by frequencies of “visual words” source: Svetlana Lazebnik

  12. 2. Learning the visual vocabulary Clustering Slide credit: Josef Sivic

  13. 2. Learning the visual vocabulary Visual vocabulary Clustering Slide credit: Josef Sivic

  14. K-means clustering (reminder) • Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk • Algorithm: • Randomly initialize K cluster centers • Iterate until convergence: • Assign each data point to the nearest center • Recompute each cluster center as the mean of all points assigned to it source: Svetlana Lazebnik

  15. Example visual vocabulary Fei-Fei et al. 2005

  16. Image Representation • For a queryimage Extractfeatures Associateeachfeaturewiththe nearest cluster center (visualword) Accumulatevisualwordfrequencies over the image Visual vocabulary x x x x x x x x x x

  17. ….. 3. Image representation frequency codewords source: Svetlana Lazebnik

  18. ….. 4. Image classification CAR frequency codewords Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them? source: Svetlana Lazebnik

  19. Image Categorization What is this? helicopter Choose from many categories

  20. Image Categorization SVM/NB Csurka et al (Caltech 4/7) Nearest Neighbor Berg et al (Caltech 101) Kernel + SVM Grauman et al (Caltech 101) Multiple Kernel Learning + SVMs Varma et al (Caltech 101) … What is this? Choose from many categories

  21. Visual Categorization with Bags of KeypointsGabriella Csurka, Christopher R. Dance, Lixin Fan, JuttaWillamowski, Cédric Bray

  22. Data • Images in 7 classes: faces, buildings, trees, cars, phones, bikes, books • Caltech 4 dataset: faces, airplanes, cars (rear and side), motorbikes, background

  23. Method Steps: • Detect and describe image patches. • Assign patch descriptors to a set of predetermined clusters (a visual vocabulary). • Construct a bag of keypoints, which counts the number of patches assigned to each cluster. • Apply a classifier (SVM or Naïve Bayes), treating the bag of keypoints as the feature vector • Determine which category or categories to assign to the image.

  24. Bag-of-Keypoints Approach Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

  25. SIFT Descriptors Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

  26. Bag of Keypoints (1) • Construction of a vocabulary • Kmeans clustering find “centroids” (on all the descriptors we find from all the training images) • Define a “vocabulary” as a set of “centroids”, where every centroid represents a “word”. Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

  27. Bag of Keypoints (2) • Histogram • Counts the number of occurrences of different visual words in each image Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

  28. Multi-class Classifier • In this paper, classification is based on conventional machine learning approaches • Support Vector Machine (SVM) • Naïve Bayes Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

  29. SVM

  30. x+ s.t. x+ x- Support Vectors Reminder: Linear SVM x2 Margin wT x + b = 1 wT x + b = 0 wT x + b = -1 x1 Slide credit: JinweiGu

  31. Nonlinear SVMs: The Kernel Trick • With this mapping, our discriminant function becomes: • No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test. • A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space: Slide credit: JinweiGu

  32. Nonlinear SVMs: The Kernel Trick • Examples of commonly-used kernel functions: • Linear kernel: • Polynomial kernel: • Gaussian (Radial-Basis Function (RBF) ) kernel: • Sigmoid: Slide credit: JinweiGu

  33. SVM for image classification • Train k binary 1-vs-all SVMs (one per class) • For a test instance, evaluate with each classifier • Assign the instance to the class with the largest SVM output

  34. Naïve Bayes

  35. Naïve Bayes Model C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class

  36. Example: Slide from Dan Klein

  37. Slide from Dan Klein

  38. Slide from Dan Klein

  39. Percentage of documents in training set labeled as spam/ham Slide from Dan Klein

  40. In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

  41. In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

  42. Classification The class that maximizes:

  43. Classification • In practice

  44. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow

  45. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities.

  46. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change.

  47. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change. • So, what we usually compute in practice is:

  48. Naïve Bayes on images

  49. Naïve Bayes C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class

  50. Naive Bayes Parameters Problem: Categorize images as one of k object classes using Naïve Bayes classifier: • Classes: object categories (face, car, bicycle, etc) • Features – Images represented as a histogram of visual words. are visual words. treated as uniform. learned from training data – images labeled with category. Probability of a visual word given an image category.

More Related