1 / 38

Convolutional Neural Nets

Convolutional Neural Nets. Advanced Vision Seminar April 19, 2015. Overview. The Revolution(2012) : ImageNet Classification with Deep Conv. Nets Large performance gap Possible Explanations What makes convnets tick? Visualizing and Understanding Deep Conv. Nets

georgej
Download Presentation

Convolutional Neural Nets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Convolutional Neural Nets Advanced Vision Seminar April 19, 2015

  2. Overview • The Revolution(2012) : ImageNet Classification with Deep Conv. Nets • Large performance gap • Possible Explanations • What makes convnets tick? Visualizing and Understanding Deep Conv. Nets • Some Limits of conv. Nets • Useful Resources

  3. ImageNet Classification with Deep Conv. Neural Networks • Until 2012: Leading methods used hand-crafted features + encoding methods (e.g, SIFT+Bag-of-Words+SVM) • NIPS 2012, Alex Krizhevsky et. Al • Significant improvement w.r.t other methods: ImageNet performance • 2010: ~28% (pre-convnet) • 2012: 16% (Krizhevsky) • 2013: 12% • 2014: 6.7%

  4. Causes for Performance Gap: • Deep Nets have been around for a long time, why the sudden gap? • Combination of multiple factors: • Network Design • Scale of Data • ReLu faster convergence • Dropout – less overfitting • SGD • GPU computations

  5. Network Design • Basic layer types: • Convolution • Nonlinearity • Pooling : max, avg • Local Normalization • 8 Layers (deeper,wider than 1989)connected can also be viewed as convolution, receptive field is entire layer Lecun, 1989 Krizhevsky, 2012

  6. Network Design (Layer 0): Input : 224x224x3 mean-subtracted Layer 1: 96 kernels of 11x11x3, stride of 4 pixels, max pool and locally normalizeLayer 2: 256 kernels of 5x5x96, max pool Layers 3-5: more convolutions, similar to 1,2 Layers 6,7 : fully connected, 4096 hidden units each Layer 8: Soft-max over 1000 classes …. Krizhevsky, 2012

  7. Num Samples vs. Num. Parameters • The network has ~60,000,000 parameters. To avoid overfitting, a lot of data is needed • Imagenet is indeed very large: > millions images • Training set: 1.2 mil. Images, 1000 obj. classes • Additional samples are generated via data augmentation: simple geometric and color transformations • Dropout

  8. Optimization - definitions • Loss on one sample (softmax-loss) • D – Data/Batch size • : “Momentum Variable” (update history) • Loss on batch : • Regularization term : • : Learning rate • Momentum improves convergence stability and speed • Regularization term crucial for performance, according to authors

  9. Optimization • Set =.9, = 0.0005 , = .001 (initially) • D=128 (batch size) • Num. Epochs: 90 • Update: • This is called SGD+Momentum

  10. Relu: Faster Convergence Krizhevsky, 2012 • Nonlinearity: tanh-> Relu (rectilinear unit) • Easier to differentiate • Avoids saturation • In practice, much faster convergence Relu Tanh

  11. Stronger Machines • Modern GPU architectures enable massively parallel computations of the sort required by deep conv. nets • Training with two strong GPU’s, this took “only” 6 days – a x50 speedup w.r.t to CPU training

  12. Imagenet Results • LSVRC : Large Scale Visual Recognition Challenge • Imagenet (2014): 1.4 mil. Images, 1000 obj. classes • Compare: Pascal : 22,000 images, 20 obj. classes)

  13. Results - 2012 Agaric

  14. Generic Use in Vision • Using the output of the fully connected layers as a generic feature extractor has proven to very strong • Beating state of the art in many datasets/benchmarks unrelated to ImageNet • This is now standard in object detection, scene classification, Scene parsing, Segmentation, and many more • “Machine crafted” vs. hand crafted features

  15. Generic Use in Vision a computer vision scientist: How long does it take to train these generic features on ImageNet?Hossein: 2 weeksAli: almost 3 weeks depending on the hardwarethe computer vision scientist:hmmmm...Stefan: Well, you have to compare the three weeks to the last 40 years of computer vision *quote from from http://www.csc.kth.se/cvap/cvg/DL/ots/ A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson "CNN features off-the-shelf: An astounding baseline for recognition", CVPR 2014, DeepVision workshop

  16. Network Design ?? • How to determine “hyper-parameters”: • No. layers? • Kernel Size/Num. of kernels? • Training rate? Number of training epochs? • … • From own experience, either: • Start with existing network & tweak / finetune • Incrementally increase network complexity – start with a few layers, see what works • Domain knowledge : convolutions are especially suited for images, not always the right choice

  17. Network Structure • (To my knowledge) no rigorous analysis of effect of network structure, either theoretical or empirical • Example: systematically check many different network structures / configurations • See what works well, what doesn’t and explain why • Guessing: Probably done by successful architectures, but “bad” results not published

  18. Visualizing & Understanding Conv. Nets • What Makes Convnets “Tick”? • What happens in hidden units? • Layer 1: easy to visualize • Deeper layers: just a bunch of numbers? Or something more meaningful? • Do convnets use context or actually model target classes

  19. Introducing: Visualizing & Understanding Conv. Nets • Zeiler & Fergus, 2013 • Goal: Try to visualize the “black box” hidden units, gain insights • Hope: Use conclusions to improve performance • Idea: “Deconvolutional” neural net

  20. Deconvolutional Nets • Originally suggested for unsupervised feature learning : construct a convolutional net, cost function is image reconstruction error • Used here to find what stimuli causes strongest responses in hidden units • Run many images through net  find strongest unit activations in each layer  visualize by “reversing” net operation

  21. Reversing a convent

  22. “Unpooling”

  23. Deconvolution* • Want to visualize a strong activation in feature map P from layer L+1 down to layer L. • As in Deep-Belief Nets: LP*F’, where F’ is the original kernel flipped in both dimensions • Intuition: can show, gradient of F w.r.t input is F’, this is back-propping error of strongest activations • *Note: Not really “deconvolution”, this is not an attempt to recover original signal

  24. Layer 1:

  25. Hidden Layer Visualizations: layer 2

  26. Hidden Layer Visualizations: Layer 3

  27. Hidden Layer Visualizations: Layer 4

  28. Hidden Layer Visualizations: Layer 5

  29. It’s Nice to Watch, But is it Useful? • Authors observed aliasing effects caused by large stride in lower layers (e.g, loss of fine texture) • Reducing filter size and stride increased performance, also reporting qualitatively “cleaner” looking filters

  30. Is the net using context? • Let’s test if the network really focuses on relevant features • Systematically occlude different parts of image • Check output confidence for true class • (This doesn’t really have to do with the visualization)

  31. Following Network paths • B.Zhou et al , Object Detectors Emerge in Deep CNNs, ICLR 2015

  32. Going Deeper with convolutions • Recently, even deeper models have been proposed: GoogLeNet – 22 layers : 6.7 top 5 error • 16 and 19 layer architectures from VGG , similar performance

  33. Limits : Easy Classes : natural, highly textured, fine-grained • Natural, Highly textured, fine-grained

  34. Limits : Difficult Classes • Man-Made, simple, non-textured, functional ?

  35. Useful Tools • For starters: MatConvNet: • Matlab • Simple, straightforward • Pre-trained popular models • Windows/Linux compatible • More advanced: Caffe : • Powerful ,open-source framework for training and testing convnets, with c++/python/matlab interfaces • Mostly Linux (some old windows ports) • “Model-Zoo” : updates often with state-of-the-art models • Many more deeplearning.net/software_links/

  36. Thank You • References: • Lecun et al, Backpropagation Applied to Handwritten Zip Code Recognition (MIT press, 1989) • Zeiler et al, Visualizing and understanding convolutional networks, ECCV14 • Rob Fergus. Deep Learning for Computer Vision (Tutorial). NIPS, 2013 (including several imgs from slides) • Russakovsky et al, ImageNetLarge Scale Visual Recognition Challenge. arXiv:1409.0575, 2014 • A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson "CNN features off-the-shelf: An astounding baseline for recognition", CVPR 2014, DeepVision workshop

More Related