CSCI 431/631 Foundations of Computer Vision

CSCI 431/631Foundations of Computer Vision Ifeoma Nwogu ion@cs.rit.edu Lecture – Localization, Detection, Segmentation

Schedule • Last class • TensorFlow • Today • CNN for localization, detection and segmentation • Readings for today:

Simplified architecture Softmax layer:

Overview of CNN Classification Task

Object classification vs localization • Classification: to identify that the picture is a certain category e.g. dog • Localization: to produce a class label and also a bounding box that describes where the object is in the picture.

Object segmentation • Localization can involve finding multiple classes • Segmentation: to identify the categories of objects as well as outlining them in the image.

Papers • Detection/ Localization: RCNN, Fast RCNN, Faster RCNN, MultiBox, Bayesian Optimization, Multi-region, RCNN Minus R, Image Windows • Segmentation: Semantic Seg, Unconstrained Video, Shape Guided, Object Regions, Shape Sharing

Conv-Deconv networks

Deconv Net • At every layer of the trained CNN attach a “deconvnet” • Deconvnet has a path back to the original input, including image pixels. • When an input image is fed into the CNN, activations are computed at each level. • This is the forward pass. • Now we reverse the process…

To examine the activations of a certain feature say the 4th conv layer • Store the activations of this one feature map • Set all of the other activations in the layer to 0, • Pass this feature map as the input into the deconvnet. • This deconvnet has the same filters as the original CNN. • This input then goes through a series of unpool (reverse maxpooling), rectify, and filter operations for each preceding layer until the input space is reached.

Why deconv? • To examine the types of structures that excite a given feature map. • We will review different layers of the CNN

Layer 1

Layer 2

Layer 3

Layers 4 and 5

Deconvolution • Input goes through a series of • unpool(reverse maxpooling), • rectify, and • filter operations for each preceding layer until the input space is reached.

What is deconvolution? • (Non-blind) Deconvolution (C) Dhruv Batra

“Transposed convolution” is a convolution! We can express convolution in terms of a matrix multiplication Example: 1D conv, kernel size=3, stride=1, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

“transposed convolution” is a convolution! We can express convolution in terms of a matrix multiplication Convolution transpose multiplies by the transpose of the same matrix: Example: 1D conv, kernel size=3, stride=1, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

“transposed convolution” is a convolution We can express convolution in terms of a matrix multiplication Convolution transpose multiplies by the transpose of the same matrix: When stride=1, convolution transpose is just a regular convolution (with different padding rules) Example: 1D conv, kernel size=3, stride=1, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

But not always We can express convolution in terms of a matrix multiplication Example: 1D conv, kernel size=3, stride=2, padding=1 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

But not always We can express convolution in terms of a matrix multiplication Convolution transpose multiplies by the transpose of the same matrix: Example: 1D conv, kernel size=3, stride=2, padding=1 When stride>1, convolution transpose is no longer a normal convolution! Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Questions

CSCI 431/631 Foundations of Computer Vision