310 likes | 417 Views
Combining efficient object localization and image classification. H. Harzallah, F. Jurie and C. Schmid LEAR, INRIA Grenoble, LJK. Tasks. Image classification: assigning labels to the image. Car: present Cow: present Bike: not present Horse: not present …. Cow. Car. Tasks.
E N D
Combining efficient object localization and image classification H. Harzallah, F. Jurie and C. Schmid LEAR, INRIA Grenoble, LJK
Tasks • Image classification: assigning labels to the image Car: present Cow: present Bike: not present Horse: not present …
Cow Car Tasks • Image classification: assigning labels to the image Car: present Cow: present Bike: not present Horse: not present … • Object localization: define the location and the category Location Category
Contributions • Object class localization method • Combining image classification and object localization Localization--Classification++ Localization++Classification--
Overview • Related work and datasets • Efficient object localization • Experimental results • Combining image classification and localization • Experimental results • Conclusion
Related work • Object localization • Sliding window [Dalal06] [Rowley95] • Implicit shape model [Leibe04] • SVM classifiers [Chum07] [Ferrari08] • Cascade of classifiers [Viola01] [Vedaldi09] • Context information • Combination of context sources [Divvala09] • Graphical model of events in images [Li07] • Local segmentation + global classification [Shotton08] [Heitz08]
PASCAL VOC dataset • PASCAL VOC dataset 2007 and 2008 • Two tasks : classification and localization • Fixed train/test set-up for the 20 object classes • Standard evaluation measure • Area of overlap as detection matching criterion • Average precision for performance evaluation
Overview • Related work and datasets • Efficient object localization • Experimental results • Combining image classification and localization • Experimental results • Conclusion
Efficient object localization Sliding window based approach Image representation Combination of features Extensive parameters evaluation Robust image representation Efficient search strategy
Image representation Histogram Histogram Histogram Histogram Histogram Histogram • Combination of 2 image representations • Histogram Oriented Gradient • Gradient based features • Integral Histograms • Bag of Features • SIFT features extracted densely + k-means clustering • Pyramidal representation of the sliding windows • One histogram per tile
Efficient search strategy • Reduce search complexity • Sliding windows: huge number of candidate windows • Cascades: pros/cons • Two stage cascade: • Filtering classifier with a linear SVM • Low computational cost • Evaluation: capacity of rejecting negative windows • Scoring classifier with a non-linear SVM • Χ2 kernel with a channel combination [Zhang07] • Significant increase of performance
Efficiency of the 2 stage localization • Performance w. resp. to nbr of windows selected by the linear SVM (mAP on Pascal 2007) • Sliding windows: 100k candidate windows • A small number of windows are enough after filtering
Localization performance • Mean Average Precision on all 20 classes • PASCAL 2007 dataset
Localization examples: correct localizations Bicycle Car Horse Sofa
Localization examples: false positives Bicycle Car Horse Sofa
Localization examples: missed objects Bicycle Car Horse Sofa
Overview • Related work and datasets • Efficient object localization • Experimental results • Combining image classification and localization • Experimental results • Conclusion
Image classification & localization use a different information Combination: key points • For many TP only one has a high score • Truncated objects: hard for the detector • Small objects: ok for the detector but not for the classifier using global information
Combination model • Input: classification ( Si ) and localization ( Sw ) scores • Output: probability that object is present • Suppose that classification and localization outputs are independent:
Combination model • For each modality (classification/detection): notion of detectability P(Di) for classifier and P(Dw) for detector • Encodes the ability to detect presence of the objects • Assuming that the classifier/detector outputs conditional probabilities: P(O|Di,Si) and P(O|Dw,Sw)
Combination model • P (O |Si) = P(Di) × P(O|Si, Di) + P(¬Di) × P(O|Si,¬Di) • P (O |Sw) = P(Dw) × P(O|Sw, Dw) + P(¬Dw) × P(O|Si,¬Dw) • Final probability: • Handle both cases: • Object detectable by two modalities • Object detectable by only one modality
Combination model • P(O|¬Di,Si) and P(O|¬Di,Si) : constant value • Sw = classification by localization: highest localization score • Priors P(Di) and P(Dw) class dependant
Combination experimental setup • Image classifier : INRIA_flat classifier • SVM classifier Χ2 kernel using multiple feature channels [Zhang07] • Excellent results in PASCAL 2008 challenge • Detector : as described previously • Experimental validation on PASCAL VOC 2007 • Comparison to the state of the art on PASCAL VOC 2008
Experimental results : gain obtained Classification Localization
Experimental results Car localization • Correct but low score localization • High classification score • score increased after combination
Experimental results Car classification • High classification score • No localization • score decreased after combination
Comparison to the state of the art • Based on blind evaluation on PASCAL VOC 2008 • Classification • Best on 12 classes out of 20 • Localization • Best on 11 classes out of 20
Conclusion • Efficient localization method • Successful combination of classification and localization • State of the art performance on both tasks