Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006

Boosted Histograms for Improved Object Detection Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006

Histograms for object recognition Remarkable success of recognition methods using histograms of local image measurements: • [Swain & Ballard 1991] - Color histograms • [Schiele & Crowley 1996] - Receptive field histograms • [Lowe 1999] - localized orientation histograms (SIFT) • [Schneiderman & Kanade 2000] - localized histograms of wavelet coef. • [Leung & Malik 2001] - Texton histograms • [Belongie et.al. 2002] - Shape context • [Dalal & Triggs 2005] - Dense orientation histograms Likely explanation:Histograms are robust to image variations such as limited geometric transformations and object class variability.

D D B B A A C C Histograms: What vs. Where What to measure? Histograms Where to measure? • No guarantee for optimal recognition • Different regions may have different discriminative power

Idea selected features boosting weak classifier    • Efficient discriminative classifier [Freund&Schapire’97] • Good performance for face detection [Viola&Jones’01] AdaBoost: Haar features SVM Neural Networks Histogram features Too heavy

Weak learner Possible approach: 1-dim. projections onto predefined vectors Example 1:

Weak learner Possible approach: 1-dim. projections onto predefined vectors Example 2:

Fischer weak learner Alternative approach: • Assume Normal distribution of features (hopefully valid at least for some of ~10^5 features!) • Compute projection direction by FLD: • Can be modified to minimize the error of weighted samples (required for boosting) feature mean feature covariance Fischer learner “1-bin” learner Evidence from real image training data:

  ~10^5 rectangle features Histograms over 4 gradient orientations, 4 subdivisions for each reactangle Histogram features

Crop and resize + • Perturb annotation • Increase training set X 10   Training data

Training: Selected Features 0.999 correct classification 10^-5 false positives 376 of ~10^5 features selected

Conf.=5 Object detection • Scan and classify image windows at different positions and scales • Cluster detections in the space-scale space • Assign cluster size to the detection confidence

PASCAL Visual Object Classes Challenge 2005 (VOC’05) #217 / #220 motorbikes bicycles #123 / #123 people #152 / #149 cars #320 / #341

Detection results: • >50 % overlap of bounding box with GT • one bounding box for each object • confidence value for each detection • Detection results: • >50 % overlap of bounding box with GT • one bounding box for each object • confidence value for each detection • Detection results: • >50 % overlap of bounding box with GT • one bounding box for each object • confidence value for each detection Evaluation criteria Ground truth annotation • Detection results: • >50 % overlap of bounding box with GT • one bounding box for each object • confidence value for each detection Precision-Recall (PR) curve: Average Precision (AP) value:

[Levi and Weiss, CVPR 2004] “Learning object detection from a small number of examples: The importance of good features” Evaluation of detection PR-curves for the “Motorbike” validation dataset: FLD learner + 1-bin classifier

Results for VOC’05 Challenge People test1 Bicycles test1 Motorbikes test1 cars test1

Results for VOC’05 Challenge Average Precision values:

PASCAL Visual Object Classes Challenge 2006 (VOC’06)

Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class“bicycle" examples

Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class“cow" examples

Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class“horse" examples

Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class“motorbike"

Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class“person"

Results for VOC’06 Challenge Average Precision values:

Final Notes • All results are obtained with a single set of parameters • Small number of training samples is sufficient • Efficient detection: 10fps on 320x280 images • Extension to texton/color histogram features is straightforward Open questions: • Other free-shape regions better? How to find them? • Better weak learner that takes advantage of histogram properties • View transformations

Final Notes • All results are obtained with a single set of parameters • Small number of training samples is sufficient • Efficient detection: 10fps on 320x280 images • Extension to texton/color histogram features is straightforward Open questions: • Other free-shape regions better? How to find them? • Better weak learner that takes advantage of histogram properties • View transformations • Detection tasks in VOC05,VOC06 are far from being solved, it is a challenge!

Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006