Introduction to Computer Vision

Introduction to Computer Vision Lecture 6 Dr. Roger S. Gaborski

Intro to CV Graduate Projects • Correlation/Convolution • David Rubel’s Master’s Project (slides included at end of this lecture)

How can we average the pixel values in an image? • The average depends on a number of pixels (a pixel and its neighbors) • Neighborhood Operation • The more neighbors, the more smoothing (averaging)

Smoothing Example • Done on white board • How do we handle pixels along the edges?????

Padding -- padarray • fp = padarray(f, [r c], method, direction) • f is input image • fp is padded image • [r c] is number of rows and columns to pad f • method and direction – next slide

Chapter 3 www.prenhall.com/gonzalezwoodseddins

padarray Example >> f = [1 2; 3 4] f = 1 2 3 4 >> fp = padarray(f, [3 2], 'replicate', 'post') fp = 1 2 2 2 3 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 Post – pad after the last element in both directions [3 2] – pad 3 rows and 2 columns

>> fp = padarray(f, [2 1], 'replicate', 'post') fp = 1 2 2 3 4 4 3 4 4 3 4 4 Post – pad after the last element in both directions [2 1] – pad 2 rows and 1 columns

>> f = [1 2 3; 1 2 3; 1 2 3] f = 1 2 3 1 2 3 1 2 3 >> fp = padarray(f, [2 2], 'symmetric', 'both') fp = ??????

>> f = [1 2 3; 1 2 3; 1 2 3] f = 1 2 3 1 2 3 1 2 3 >> fp = padarray(f, [2 2], 'symmetric', 'both') fp = 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2 2 1 1 2 3 3 2

Spatial Filtering • Neighborhood processing • Define center point (x,y) • Perform operations involving only pixels in the neighborhood • Result of operation is response to process at that point • Moving the pixel results in a new neighborhood • Repeat process for every point in the image

Linear and Nonlinear Spatial Filtering • Linear operation • Multiply each pixel in the neighborhood by the corresponding coefficient and sum the results to get the response for each point (x,y) • If neighborhood is m x n , then mn coefficients are required • Coefficients are arranged in a matrix, called • Filter • Filter mask • Kernel • Template • Mask sizes are typically odd sizes (3x3, 5x5, etc.) • Larger the mask, greater the compute time

Chapter 3 www.prenhall.com/gonzalezwoodseddins

Correlation -- Convolution • Correlation • Place mask w on the image array f as previously described • Convolution • First rotate mask w by 180 degrees • Place rotated mask on image as described previously

Example - Correlation • Assume w and f are one dimensional • Origin of f is its left most point • Place w so that its right most point coincides with the origin of f • Pad f with 0s so that there are corresponding f points for each w point (also pad end with 0s) • Multiply corresponding points and sum • In this case (example on next page) result is zero • More w to the right one value, repeat process • Continue process for whole length of f

Chapter 3 www.prenhall.com/gonzalezwoodseddins ‘full’ is the result we obtain from the operations on the previous slide. If instead of aligning the left most element of f with the right most element of w we aligned the center element of w with the left most value of f we would obtain the ‘same’ result, same indicating the result is the same length of the original w

‘Full’ correlation

‘Same’ correlation etc.

Example - Convolution • Convolution is the same procedure, but the filter is first rotated 180 degrees. • If the filter is symmetric, correlation and convolution results are the same

Chapter 3 www.prenhall.com/gonzalezwoodseddins Can simply extend to images

SCENE CLASSIFICATION USING PLSA AND SPATIAL INFORMATION MS Project by David Rubel

OUTLINE Problem Previous Work Datasets Key Concepts Implementation Results Questions

PROBLEM • What is scene classification? • Assigning a scene label to arbitrary images • Potential uses • Content-based image retrieval • Web accessibility • Object detection/localization

PREVIOUS WORK • Holistic Methods • Oliva and Torralba (2001) • Defined a spatial envelope for each image • Consists of naturalness, openness, roughness, expansion and ruggedness. • Trained Discriminant Spectral Templates (DSTs) to processes novel images • Used K-Nearest Neighbors for classification • Produced an excellent dataset

PREVIOUS WORK Water Rock Grass • Semantic Methods • Vogel and Schiele (2004) • Divide each image into 10x10 grid and label each material using SVMs • Create three histograms of materials (COVs) • Classify the image using these COVs • Created another interesting dataset

PREVIOUS WORK • Bag-of-Words Methods • Fei-Fei and Perona (2005) • Search images for textons • Group textons into visual words using k-means clustering • Group visual words together using Bayesian statistics • Label images using a Bayesian classifier • Bosch, Zisserman and Muñoz (2008) • Use SIFT features instead of textons • Use pLSA to group words into topics • Classify images with SVM

DATASET • Oliva and Torralba (OT) • 1472 natural images

DATASET • Oliva and Torralba (OT) • 1216 man-made images

DATASET • Vogel and Schiele (VS) • 700 natural images

KEY CONCEPTS (SIFT) • Scale-Invariant Feature Transform (SIFT) • Interest point detector introduced by David G. Lowe • Points are invariant to scale and rotation • Partially invariant to affine warp and lighting • Four stage process • Scale-space extrema detection • Keypoint localization • Orientation assignment • Keypoint descriptors

KEY CONCEPTS (SIFT) a) b) • Scale-space extrema detection

KEY CONCEPTS (SIFT) • Keypoint localization • Keypoints are refined to subpixel accuracy • Keypoints along edges are removed • Keypoints in areas of low contrast are removed • Orientation assignment • Gradient direction and magnitude are computed for the area surrounding the keypoint • The keypoint is assigned the orientation most represented in the pixel neighborhood • Uses a 36-bin directional histogram with Gaussian weight

KEY CONCEPTS (SIFT) • Keypoint descriptors • 4x4x8 bin histogram of gradient magnitudes • Normalized for some lightning invariance

KEY CONCEPTS (PLSA) D W • Probabilistic Latent Semantic Analysis (pLSA) • Factor analysis presented by Thomas Hofmann • Originally used in text processing field • Set of words W = {w1, …, wM} • Set of documents D = {d1, …, dN} • Describe each document as a histogram of words n(wi, dj )

KEY CONCEPTS (PLSA) D Z D = * W W Z P(zk | dj ) • Compare documents by their distribution of words • Not an ideal solution • Synonyms & polysems • Dense descriptors • pLSA: Add a latent variable (Z = {z1, …, zK}) P(wi | dj ) P(wi | zk )

KEY CONCEPTS (PLSA) • Compute matrices with Expectation Maximization • Expectation Step – computes posterior probabilities • Maximization Step – computes other probabilities • Continue running until perplexity stops decreasing on hold-out data

KEY CONCEPTS (SVMS) Best separator Convex Hulls • Support Vector Machines (SVMs) • Binary classification tool which finds separating hyperplanes

KEY CONCEPTS (SVMS) • Not all problems are linearly separable • Find a best-fit separator • Use a kernel to map data to a higher dimension Best-fit separator Use of RBF kernel

IMPLEMENTATION DETAILS • Building visual words • Find SIFT features in image dataset • Color SIFT features, HSV color space • Dense SIFT detector for better results (M = 8) • Scale-invariance via 4 concentric circles (r = 4, 8, 12, 16) • 64-bit floats -> 16-bit unsigned integers • Cluster features to create visual words • SIFT features alone are too varied • Improved k-means clustering by Charles Elkan • K = 1,500 • 200,000 features • Quantize features • Build histograms

IMPLEMENTATION DETAILS • Testing the classification system • Divide the images into training & testing sets • Run pLSA if requested • Use standard pLSA for training data • Use fold-in heuristic for testing data • Z = 25 • Train SVMs • LIBSVM with MATLAB wrapper • Use one-versus-all method • RBF kernel

RESULTS Test the grouping of pLSA topics

RESULTS Test the discriminative power of pLSA topics

RESULTS KNN: pLSA outperforms BOW (74.6% to 65.0%)

RESULTS • Tried incorporating spatial information • Divide the image into a grid • Train SVMs for each section • Sum results for each class over all sections

RESULTS Four-Class OT (Natural Images)

RESULTS Ambiguous images from the OT dataset

RESULTS Four-Class OT (Man-Made Images)

RESULTS Eight-Class OT (Both)

Introduction to Computer Vision