310 likes | 586 Views
Large-Scale Object Recognition with Weak Supervision. Weiqiang Ren , Chong Wang, Yanhua Cheng, Kaiqi Huang, Tieniu Tan. { wqren,cwang,yhcheng,kqhuang,tnt }@nlpr.ia.ac.cn. Task2 : Classification + Localization. Task 2b: Classification + localization with additional training data
E N D
Large-Scale Object Recognition with Weak Supervision WeiqiangRen, Chong Wang, Yanhua Cheng, KaiqiHuang, TieniuTan {wqren,cwang,yhcheng,kqhuang,tnt}@nlpr.ia.ac.cn
Task2 : Classification + Localization Task 2b: Classification + localization with additional training data — Ordered by classification error Only classification labels are used Full image as object location
Outline • Motivation • Method • Results
Why Weakly Supervised Localization (WSL)? Knowing where to look, recognizing objects will be easier ! However, in the classification-only task, no annotations of object location are available. Weakly Supervised Localization
13.9: Weakly supervised object detector learning with model drift detection, ICCV 2011 15.0: Object-centric spatial pooling for image classification, ECCV 2012 22.4: Multi-fold mil training for weakly supervised object localization, CVPR 2014 22.7: On learning to localize objects with minimal supervision, ICML 2014 26.2: Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision, submitted to TPAMI 26.4: Weakly supervised object detection with posterior regularization, BMVC 2014 31.6: Weakly supervised object localization with latent category learning, ECCV 2014 Sep 11, Poster Session 4A, #34
Our Work Weakly Supervised Object Localization with Latent Category Learning Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision ECCV 2014 Submitted to TPAMI For the consideration of high efficiency in large-scale tasks, we use the second one.
Framework 2 Det Prediction 3 Rescoring 4 Cls Prediction … Conv Layers 1 Input Images FC Layers
1st: CNN Architecture Chatfield et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets
MILinear : Region Proposal • Good region proposal algorithms • High recall • High overlap • Small number • Low computation cost • MCG pretrained on VOC 2012 • Additional Data • Training: 128 windows/ image • Testing: 256 windows/image • Compared to Selective Search (~2000)
MILinear: Feature Representations • Low Level Features • SIFT, LBP, HOG • Shape context, Gabor, … • Mid-Level Features • Bag of Visual Words (BoVW) • Deep Hierarchical Features • Convolutional Networks • Deep Auto-Encoders • Deep Belief Nets
MILinear: Positive Window Mining • Clustering • KMeans • Topic Model • pLSA, LDA, gLDA • CRF • Multiple Instance Learning • DD, EMDD, APR • MI-NN, • MI-SVM, mi-SVM • MILBoost
MILinear: Objective Function and Optimization • Multiple instance Linear SVM • Optimization: trust region Newton • A kind of Quasi Newton method • Working in the primal • Faster convergence
3rd: Detection Rescoring • Rescoring with softmax train softmax … … max 128 boxes …… …… 1000 dim 1000 dim 1000 classes Softmax: consider all the categories simultaneously at each minibatch of the optimization – Suppress the response of other appearance similar object categories
4th: Classification Rescoring • Linear Combination … … … 1000 dim 1000 dim 1000 dim One funny thing: We have tried some other strategies of score combination, but it seems not working !
2nd: MILinear on ILSVRC 2013 detection mAP: 9.63%! vs 8.99% (DPM5.0)
3rd: WSL Rescoring (Softmax) The Softmax based rescoring successfully suppresses the predictions of other appearance similar object categories !
4th: Cls and WSL Combinataion WSL and Cls can be complementary to each other!
Russakovskyet al. ImageNet Large Scale Visual Object Challenge.
Conclusion • WSL always helps classification • WSL has large potential: WSL data is cheap