1 / 24

CSCI 431/631 Foundations of Computer Vision

This lecture explores the concepts of localization, detection, and segmentation using convolutional neural networks (CNNs). It discusses various papers and techniques related to these tasks, including RCNN, Fast RCNN, Faster RCNN, and Deconvolution networks. The lecture also explains the process of deconvolution and its use in examining feature maps of CNNs.

ednac
Download Presentation

CSCI 431/631 Foundations of Computer Vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCI 431/631Foundations of Computer Vision Ifeoma Nwogu ion@cs.rit.edu Lecture – Localization, Detection, Segmentation

  2. Schedule • Last class • TensorFlow • Today • CNN for localization, detection and segmentation • Readings for today:

  3. Simplified architecture Softmax layer:

  4. Overview of CNN Classification Task

  5. Object classification vs localization • Classification: to identify that the picture is a certain category e.g. dog • Localization: to produce a class label and also a bounding box that describes where the object is in the picture.

  6. Object segmentation • Localization can involve finding multiple classes • Segmentation: to identify the categories of objects as well as outlining them in the image.

  7. Papers • Detection/ Localization: RCNN, Fast RCNN, Faster RCNN, MultiBox, Bayesian Optimization, Multi-region, RCNN Minus R, Image Windows • Segmentation: Semantic Seg, Unconstrained Video, Shape Guided, Object Regions, Shape Sharing

  8. Conv-Deconv networks

  9. Deconv Net • At every layer of the trained CNN attach a “deconvnet” • Deconvnet has a path back to the original input, including image pixels. • When an input image is fed into the CNN, activations are computed at each level. • This is the forward pass. • Now we reverse the process…

  10. To examine the activations of a certain feature say the 4th conv layer • Store the activations of this one feature map • Set all of the other activations in the layer to 0, • Pass this feature map as the input into the deconvnet. • This deconvnet has the same filters as the original CNN. • This input then goes through a series of unpool (reverse maxpooling), rectify, and filter operations for each preceding layer until the input space is reached.

  11. Why deconv? • To examine the types of structures that excite a given feature map. • We will review different layers of the CNN

  12. Layer 1

  13. Layer 2

  14. Layer 3

  15. Layers 4 and 5

  16. Deconvolution • Input goes through a series of • unpool(reverse maxpooling), • rectify, and • filter operations for each preceding layer until the input space is reached.

  17. What is deconvolution? • (Non-blind) Deconvolution (C) Dhruv Batra

  18. “Transposed convolution” is a convolution! We can express convolution in terms of a matrix multiplication Example: 1D conv, kernel size=3, stride=1, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  19. “transposed convolution” is a convolution! We can express convolution in terms of a matrix multiplication Convolution transpose multiplies by the transpose of the same matrix: Example: 1D conv, kernel size=3, stride=1, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  20. “transposed convolution” is a convolution We can express convolution in terms of a matrix multiplication Convolution transpose multiplies by the transpose of the same matrix: When stride=1, convolution transpose is just a regular convolution (with different padding rules) Example: 1D conv, kernel size=3, stride=1, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  21. But not always We can express convolution in terms of a matrix multiplication Example: 1D conv, kernel size=3, stride=2, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  22. But not always We can express convolution in terms of a matrix multiplication Convolution transpose multiplies by the transpose of the same matrix: Example: 1D conv, kernel size=3, stride=2, padding=1 When stride>1, convolution transpose is no longer a normal convolution! Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  23. Questions

More Related