1 / 21

Deep Epitomic Nets and Scale /Position Search for Image Classification

Deep Epitomic Nets and Scale /Position Search for Image Classification. TTIC_ECP team. George Papandreou Toyota Technological Institute at Chicago. Iasonas Kokkinos Ecole Centrale Paris/INRIA. TTIC_ECP entry in a nutshell. Fusion (1)+(2). (0) Baseline: max-pooled net.

anne-sharpe
Download Presentation

Deep Epitomic Nets and Scale /Position Search for Image Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deep Epitomic Nets and Scale/Position Search for Image Classification TTIC_ECP team George Papandreou Toyota Technological Institute at Chicago Iasonas Kokkinos EcoleCentrale Paris/INRIA

  2. TTIC_ECP entry in a nutshell Fusion (1)+(2) (0) Baseline: max-pooled net (2) epitomic DCNN+ search (1) epitomic DCNN Goal: Invariance in Deep CNNs Part 1: Deep epitomic nets: local translation (deformation) Part 2: Global scaling and translation 10.56% 10.22% 13.0% 11.9% ~1% gain ~1.5% gain Top-5 error. All DCNNs have 6 convolutional and 2 fully-connectedlayers.

  3. Deep Convolutional Neural Networks (DCNNs) convolutional fully connected Cascade of convolution + max-pooling blocks (deformation-invariant template matching) Our work: different blocks (P1) & different architecture (P2) LeCun et al.: Gradient-Based Learning Applied to Document Recognition, Proc. IEEE 1998 Krizhevsky et al.: ImageNetClassification with Deep CNNs, NIPS 2012

  4. Part 1: Deep epitomic nets

  5. Epitomes: translation-invariant patch models Patch Templates Separate modeling: more data & less power per parameter Epitomes: a lot more for just a bit more EM-based training Jojic, Frey, Kannan: Epitomic analysis of appearance and shape, ICCV 2003 Benoit, Mairal, Bach, Ponce: Sparse image representation with epitomes, CVPR 2011 Grosse, Raina, Kwong, Ng: Shift-invariant sparse coding, UAI 2007

  6. Mini-epitomes for image classification Dictionary of mini-epitomes Dictionary of patches (K-means) Gains in (flat) BoW classification Papandreou, Chen, Yuille:Modeling Image Patches with a Dictionary of Mini-Epitomes, CVPR14

  7. From flat to deep: Epitomic convolution Max-Pooling Epitomic Convolution Max over image positions Max over epitome positions G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.

  8. Deep Epitomic Convolutional Nets Epitomic convolution Convolution + max-pooling Supervised dictionary learning by back-propagation G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.

  9. Deep Epitomic Convolutional Nets Parameter sharing: faster and more reliable model learning Consistent improvements (0) Baseline: max-pooled net (1) epitomic DCNN 13.0% 11.9% ~1% gain

  10. Part 2: Global scaling and translation

  11. Scale Invariance challenge Dogs Category-dependent (ear detector) Scale-dependent (area)

  12. Scale Invariance challenge Dogs Category-dependent (ear detector) Skyscrapers Scale-dependent

  13. Scale Invariance challenge Training set Dogs Category-dependent (ear detector) Skyscrapers Scale-dependent

  14. Scale Invariance challenge Rule: Large skyscrapers have ears, large dogs don’t Dogs Category-dependent (ear detector) Skyscrapers Scale-dependent

  15. Scale Invariant classification Category-dependent MIL: End-to-end training! Scale-dependent feature ‘bag’ of features This work: A. Howard. Some improvements on deep convolutional neural network based image classification, 2013. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014. T. Dietterich et al. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 1997.

  16. Step 1: Efficient multi-scale convolutional features 220x220x3 5x5x512 C(x,y,s) pyramid stitch GPU I(x,y) Patchwork(x,y) C(x,y) I(x,y,s) unstitch multi-scaleconvolutional features Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: ICLR 2014 Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. ECCV 2012 Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., Keutzer, K.: Densenet. arXiv 2014

  17. Step 2: From fully connected to fully convolutional 220x220x3 1x1x4096 stich pyramid GPU I(x,y) Patchwork(x,y) F(x,y) I(x,y,s) convolutional convolutional fully connected

  18. Step 3: Global max-pooling stich pyramid GPU I(x,y) Patchwork(x,y) I(x,y,s) learned class-specific bias Consistent, explicit position and scale search during training and testing Fusion (1)+(2) For free: argmax yields 48% localization error (0) Baseline: max-pooled net (2) epitomic DCNN+ search (1) epitomic DCNN 10.22% 10.56% 13.0% 11.9% ~1% gain ~1.5% gain

  19. Deep Epitomic Nets and Scale/Position Search for Image Classification Fusion (1)+(2) (0) Baseline: max-pooled net DCNN: 6 Convolutional + 2 Fully Connected layers Goal: Invariance in Deep CNNs (2) search Epitomic DCNN 10.56% 10.22% 13.0% 11.9% ? ~1% gain ~1.5% gain The Deeper the Better: stay tuned!

  20. Epitomic implementation details • Architecture of our deep epitomic net (11.94%) • Training took 3 weeks on a singe Titan (60 epochs) • Standard choices for learning rate, momentum, etc.

  21. Pyramidal search implementation details • Image warp to square image. Position in mosaic is fixed • Scales: 400, 300, 220, 160, 120, 90 pixels  Mosaic: 720 pixels

More Related