240 likes | 250 Views
This lecture explores the concepts of localization, detection, and segmentation using convolutional neural networks (CNNs). It discusses various papers and techniques related to these tasks, including RCNN, Fast RCNN, Faster RCNN, and Deconvolution networks. The lecture also explains the process of deconvolution and its use in examining feature maps of CNNs.
E N D
CSCI 431/631Foundations of Computer Vision Ifeoma Nwogu ion@cs.rit.edu Lecture – Localization, Detection, Segmentation
Schedule • Last class • TensorFlow • Today • CNN for localization, detection and segmentation • Readings for today:
Simplified architecture Softmax layer:
Overview of CNN Classification Task
Object classification vs localization • Classification: to identify that the picture is a certain category e.g. dog • Localization: to produce a class label and also a bounding box that describes where the object is in the picture.
Object segmentation • Localization can involve finding multiple classes • Segmentation: to identify the categories of objects as well as outlining them in the image.
Papers • Detection/ Localization: RCNN, Fast RCNN, Faster RCNN, MultiBox, Bayesian Optimization, Multi-region, RCNN Minus R, Image Windows • Segmentation: Semantic Seg, Unconstrained Video, Shape Guided, Object Regions, Shape Sharing
Deconv Net • At every layer of the trained CNN attach a “deconvnet” • Deconvnet has a path back to the original input, including image pixels. • When an input image is fed into the CNN, activations are computed at each level. • This is the forward pass. • Now we reverse the process…
To examine the activations of a certain feature say the 4th conv layer • Store the activations of this one feature map • Set all of the other activations in the layer to 0, • Pass this feature map as the input into the deconvnet. • This deconvnet has the same filters as the original CNN. • This input then goes through a series of unpool (reverse maxpooling), rectify, and filter operations for each preceding layer until the input space is reached.
Why deconv? • To examine the types of structures that excite a given feature map. • We will review different layers of the CNN
Deconvolution • Input goes through a series of • unpool(reverse maxpooling), • rectify, and • filter operations for each preceding layer until the input space is reached.
What is deconvolution? • (Non-blind) Deconvolution (C) Dhruv Batra
“Transposed convolution” is a convolution! We can express convolution in terms of a matrix multiplication Example: 1D conv, kernel size=3, stride=1, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
“transposed convolution” is a convolution! We can express convolution in terms of a matrix multiplication Convolution transpose multiplies by the transpose of the same matrix: Example: 1D conv, kernel size=3, stride=1, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
“transposed convolution” is a convolution We can express convolution in terms of a matrix multiplication Convolution transpose multiplies by the transpose of the same matrix: When stride=1, convolution transpose is just a regular convolution (with different padding rules) Example: 1D conv, kernel size=3, stride=1, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
But not always We can express convolution in terms of a matrix multiplication Example: 1D conv, kernel size=3, stride=2, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
But not always We can express convolution in terms of a matrix multiplication Convolution transpose multiplies by the transpose of the same matrix: Example: 1D conv, kernel size=3, stride=2, padding=1 When stride>1, convolution transpose is no longer a normal convolution! Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n