The Role of Features, Algorithms and Data in Visual Recognition

The Role of Features, Algorithms and Data in Visual Recognition Reporter: WuBin Data: 2011.04.29

Outline • Authors • Abstract • Problem • Experiment & Result • Discussion • Conclusion

Authors (Devi Parikh) • Research Assistant Professor at Toyota Technological Institute (TTI) at Chicago • Research interests • computer vision • machine learning • pattern recognition • Education • Ph.D. : Carnegie Mellon University (2009) • MS.: Carnegie Mellon University (2007) • BS.: Rowan University (2005) • Publications • CVPR’11(2), CVPR’10(1), CVPR’09(1), CVPR’08(1), ECCV,08(1), ICCV’07(1), CVPR’07(Best Paper )… Cornell University (2009) Microsoft Research (twice) Intel Research (once)

Authors (C. Lawrence Zitnick) • Microsoft Research, Redmond • Research Interests • Color Image • Single Image • Stereo Matching • Education • PhD student, Robotics Institute Carnegie Mellon University • Publications • CVPR’10(2), ACM Trans. Graph(2), ECCV’10(2), CVPR’09(2), CVPR’08(2), ECCV,08(2), PAMI’08(1), IJCV’07(1), CVPR’06(1), ECCV,06(1), ICCV’05(1), PAMI’00(1)…

Abstract • Which factors is critical to humans’ superior performance on visual (scene and object) recognition • Learning algorithm • Amount of training data • Features representations • In this work we take a small step towards this goal by performing a series of human studies and machine experiments • We find no evidence that human pattern matching algorithms are better than standard machine learning algorithms • We find that humans don’t leverage increased amounts of training data • Through statistical analysis on the machine experiments and supporting human studies, we find that the main factor impacting accuracies is the choice of features

摘要 • 在视觉识别（场景以及物体）领域存在很多基于计算机视觉的相关算法。为了取得更好的识别效果，有的系统着眼于复杂的学习算法，一些则利用大量的训练数据，还有一些考虑对更有效的特征进行建模。然而遗憾的是，所有这些系统都远远无法达到人类的识别能力。如果我们了解了人类在视觉识别上的响应方式，那么就能对上述三种方式的有效性产生更深刻的认识，从而发现究竟是什么造就了人类优越的识别能力。 • 本文通过对人类学习和机器学习的一系列实验，朝着这个方向前进了一小步。我们发现，没有任何证据证明人类的学习算法要优于标准的机器学习算法。另外，人类也不依赖于增加大量的训练样本来提高识别能力。在本文实验的基础上，通过统计分析发现，影响识别精度的最重要的因素在于特征的选择。

Problem What makes humans so much better at these tasks than today’s machines?

Experiment • Machine experiments • Classifiers • Datasets • Feature types • Dimensionality • Proportion of noisy features • Number of training instances • Human studies

Machine experiments (1) • Classifiers • NN: nearest neighbor • NCM: nearest class-mean • LSVM: linear SVM • QSVM: SVM with a quadratic polynomial kernel • CSVM: SVM with a cubic polynomial kernel • RBFSVM: SVM with an Radial Basis Function (RBF) kernel • DT: decision tree • NET: a multi-layer perceptron neural network with 1 hidden layer and 20 hidden layer nodes • BOOST: boosting with linear SVM on individual features as the simple learners • LDASVM: Principal Component Analysis (PCA) then Linear Discriminant Analysis (LDA) followed by a linear SVM classifier

Machine experiments (2) • Datasets • OSR: eight categories (coast, forest, highway, inside-city, mountain, open-country, street, tall-building) of outdoor scene recognition dataset • ISR: eight categories (bathroom, bedroom, dining room, gym, kitchen, living room, movie theater and stairs) from the indoor scene recognition dataset • PA1: eight categories (bird, bottle, cat, dog, horse, person, pottedplant, sheep) from the PASCAL object recognition dataset • PA2: eight other categories (aeroplane, bicycle, boat, chair, car, dining table, motorbike, sofa) from the same PASCAL object recognition dataset • CAL: six categories (aeroplane, car-rear, face, ketch, motorbike, watch) from the Caltech-101 object categories dataset

Machine experiments (3) • Feature types • CH: color histogram computed by assigning all pixels in an image to a pre-computed universal color dictionary computed using k-means • TH: texture histogram computed over a discretization of multi-scale edge orientations in the image • GIST: gist descriptor • BOW: bag-of-words feature descriptor for the CAL dataset • ATT: binary attributes, indicate whether the objects have certain higher-level attributes such as being round, or furry, or having a head, etc.

Machine experiments (4) • Dimensionality • CH: {4; 8; 16; 32; 64; 128; 256} • GIST: vary the number of edge-orientations (osi ) at each of the three scales ([s1; s2; s3]), as well as the number of spatial blocks (n x n) the image is divided into. • TH: vary the number of orientation bins across the three scales to obtain descriptors of different dimensionality • BOW: the dimensionality was kept fixed at 200 by using a dictionary with 200 SIFT codewords • ATT: the dimensionality was also kept fixed at 64 for PA2, while PA1 used a 32 bit version in addition to the 64 bit one by dropping the attributes that were almost always set to zero across the dataset

Machine experiments (5) • Proportion of noisy features • {0%; 25%; 50%; 100%; 200%} • 200% indicates that twice the number of original features are added as noisy features • Gaussian distribution with the same mean and standard deviation

Machine experiments (6) • Number of training instances • Vary the number of training instances used per category in the range {2; 4; 8; 16; 32; 64; 100(88 for CAL)}

Human Studies (1) • To prevent the use of prior knowledge about images by the subjects, we do not display to them any direct image information such as texture patches or color. Instead, we use abstracted visual patterns as stimuli. • subjects performed at 34% for Figure 3(a), 47% for (b), 50% for (c) and 47% for (d)

Human Studies (2)

Result • The Role of Algorithms • The Role of Data • The Role of Features

The Role of Algorithms • Fix the set of input features and training data for each set of experiments • The learning algorithm used by humans is not superior to state-of-the-art techniques on these types of problems

The Role of Data (1) • The machine experiments show consistent improvement as the number of training instances increase • It is unlikely that machine accuracies will match those of humans when given the original images • Humans may not be as capable at leveraging large amounts of training data for pattern matching • Humans are very capable of generalizing from a small number of training examples

For machine experiments significantly more training examples are needed to achieve similar levels of accuracy Linear SVMs and NCM are less sensitive to noise Humans are also susceptible to data noise The Role of Data (2)

The Role of Features (1) • Edge and gradient based features typically out perform color based features across the various datasets • Humans are also known to be very sensitive to edge or contour information • We perform human studies on recognition using the same features and training sets as the machine experiments, and they show similar treads • The feature set is critical for recognition

The Role of Features (2) • Humans are shown natural images from the outdoor scene recognition dataset under different transformations and asked to select a category name from a list • Don’t provide subjects with any training data • Two transformations • Block test • The image is divided into non-overlapping blocks, and the pixels in each block are randomly shuffled • Maintains the global layout of the scene, but the local statistics are lost • Puzzle-test • The image is divided into non-overlapping blocks, but the blocks are randomly shuffled in the image while maintaining the pixels’ relative locations in the block • Local regions of the image are preserved while the global layout is not.

The Role of Features (3) • In both high and low resolution images, human recognition is robust to a significant loss of local statistics. This indicates that humans rely on the global layout of the scene for scene recognition • In high resolution images, human recognition rates are also very robust even when the global layout of the scene is drastically altered, which indicates that humans can also rely on local regions of images

The Role of Features (4) • Humans do not rely on a fixed set of features. Depending on the information available to them, humans can adaptively rely on different sets of features during testing. This is true even if similar instances have never been seen before. This ability to adapt during testing is not seen in standard machine learning algorithms

Discussion

Discussion • The notion of features goes beyond the choice between colors, texture, the need for spatial information, etc. It includes the concepts of incorporating semantic attributes that are shared across categories • Perhaps what makes the human feature representation so powerful is that these feature representations are tuned for high performance at a variety of tasks • It is important to note that in addition to visual features, humans leverage prior knowledge from several non-visual higher-level and semantic features about how the world we live in functions

Conclusion • In this paper we study human responses on visual recognition problems as posed to machines, to gain insight into which of the three factors is critical to humans’ superior performance. • learning algorithm • amount of training data • features • We find no evidence that human pattern matching algorithms are better than standard machine learning algorithms. Moreover, we find that humans don’t leverage increased amounts of training data. We thus hypothesize with the aid of ANOVA analysis that features are the main factor contributing to superior human performance. Future work involves extensive studies to identify which visual features humans rely on to aid in the development of novel machine recognition algorithms.

Thanks

The Role of Features, Algorithms and Data in Visual Recognition

The Role of Features, Algorithms and Data in Visual Recognition

Presentation Transcript

Features and Objects in Visual Processing

Visual Grouping and Recognition

Visual Word Recognition

Verbal and visual features in film

Visual Word Recognition

Visual Pattern Recognition

Image Classification: Features, Algorithms or Data?

The Role of Algorithms in Computing

Visual Object Recognition

Visual Object Recognition

1. The Role of the Algorithms in Computer

Visual Grouping and Recognition

Visual Object Recognition

The Role of Features in Phonological Inventories

The role of auditory-visual integration in object recognition

The role of phonology in visual word recognition and reading

On Visual Recognition

Visual Object Recognition

The role of phonology in visual word recognition and reading

Visual Recognition

The Visual Recognition Machine

The Role of Algorithms in Computing