Sreekanth Vempati ( 200402044 ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ),

Efficient SVM based object classification and detection SreekanthVempati (200402044) Advisors: Dr. C. V. Jawahar (IIIT Hyderabad), Dr. Andrew Zisserman (Univ. of Oxford)

Large Visual Data Cheap capturing, storage and internet devices

Rapid Video sharing Rapid growth in the amount of data available In the case of youtube Image sharing

Problems • Object Detection • Find the location of specified categories of scenes/objects • Scene/Object Classification • Find specified categories of scenes/objects Is there a a demonstration/protest in this image? Is there a bus in this image? Output the bounding box of the bus in this image

Challenges Intra class variations Ex: Boat/Ship category Inter class similarity Flowers Cityscape Protest

Challenges View Point variation Occlusions/Truncations

Scalability • We need solutions which can be scalable to large amount of data • For example, if we have to test 1,40,000 images • For best performance • Feature representation (Visual words based) • 6300 dimensions • takes ~50 seconds ->total time would be ~57 days • Classification (SVM with non-linear kernel) • 20 classes • 3 images/second, a total time of ~ 10 days

Overview • Large scale semantic concept retrieval in videos • Modeling subcategories • Efficient detection by using GRBF feature maps • Conclusions

1. Semantic video retrieval • Given a large set of videos, retrieve the videos of specific category • Ex: Find all the videos containing soccer

Overview of the approach Annotated Video Frames Example Videos Feature Extraction Ex: PHOW, PHOG, GIST Training Testing Feature Extraction Classifier Ex: SVM, Random Forests Ranked Shots Unseen Videos

Features • GIST – Torralba et. al IJCV 01 • Image divided into m x m grid • For each cell, a set of filters (different scales, orientations) are applied • Final descriptor: Average of the filter responses over all blocks Images from “Image Classification for large number of object categories”, Anna Bosch, 2006

Features Pyramid Histogram of Oriented Gradients Images from “Image Classification for large number of object categories”, Anna Bosch, 2006

Pyramid Histogram of Visual Words Using dense SIFT descriptors Scale Invariant Feature Transform Vector Quantization “Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories.”, S. Lazebnik et. al CVPR 2006

Support Vector Machines (SVM) Xi i = 1,..…..,N yi i = 1,……,N Misclassified point  < 1 b Support Vector Support Vector  = 0 w wt(x) + b = -1 wt(x) + b = 0 wt(x) + b = +1

SVM formulation Evaluation function f(x) = wtx + b

Kernel Trick • Use a function which maps input space to feature space. • And then build the classifier in feature space.

Moving to different space Dot product in feature space f(x) = wtx + b = i iyi<(xi) , (x) >+ b

Kernelizing SVMs Replace it with kernel function

Linear : • Polynomial : • Intersection kernel • Generalized RBF kernel : • Weighted combination of multiple kernels Kernels

TRECVID competition • Objective : Rank video shots based on the presence of given concept • Participated in High level feature extraction, TRECVID • Organized by NIST, USA • 2008: around 180 submissions by 40 teams from all over the world

Some of the classes • High-level Feature Extraction • Cityscape • Classroom • Driver • Two People • Emergency Vehicle • Harbor • Kitchen • Nighttime • Singing • Demonstration/Protest • Mountain • Hand • Street • Telephone • Flower • Bridge • Airplane flying • Boat/Ship • Bus • Dog

Data Statistics Evaluation Measure • Average Precision • - Area under Precision-Recall curve

Our Approach • Performance compared using different features and SVM parameters • Use of PHOW with Intersection kernel is efficient • Testing is very fast, with little drop in performance Testing time: ~2lakh frames in 10 seconds “Classification using Intersection kernel SVMs is efficient”, A. Berg et. al, CVPR 2009

Variation with features

Variation with kernels

Results More Results

1. Summary • Method of visual concept retrieval suitable for large scale data • PHOW with fast intersection kernel is very much useful

2. Modeling subcategories

Subcategories in real world

What we achieved?

Structural SVM vs SVM • Allows the output label to be a complex variable - Joint feature map between input and output • Our case: Use as a combination of category and • subcategory labels “Support Vector Learning for Interdependent and Structured Output Spaces”, I. Tsochantaridis, , et. al ICML 04

Use of latent variables “Learning structural SVMs with latent variables”, C. N. Yu et. al ICML 2009

Toy Datasets

Real world datasets • TRECVID 2009 dataset • PASCAL VOC (Visual Object Categorization) 2007 • Object Detection dataset

Results on TRECVID dataset

Improvement with latent SVM

Effect of no. of subclasses

2. Summary • Method for modeling of subcategories using structural SVM • Application of latent structural SVM for further improvements • Improved the performance of linear kernel • Performed various experiments on toy and real data

3. Generalized RBF feature maps for Efficient Detection

Object Detection aeroplane bicycle cow car horse motorbike

Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results

General Framework for detection Any Image Ex: Car Classifier (Ex: SVM ) Feature representations Non-linear SVM Linear SVM “Multiple Kernel Learning for Object Detection”, Vedaldi et. al, ICCV 2009, “Cascade Object Detection with Deformable Part Models”, Felzenszwalb et. al, CVPR 2010,

Kernels • Fast Linear SVMs • Stochastic SVM (PEGASOS) • Primal SVM (liblinear) • One-slack SVM (SVM-perf) • Linear SVM • Additive kernels • Generalized RBF kernels faster Ex: intersection Kernel Ex: exp- kernel more discriminative

Kernels Problem: GRBF kernels with high computational complexity are required to get good performance Our Solution: Approximate Generalized RBF kernels with a linear one by using a feature map

Speeding up non-linear SVMs • A kernel is a dot product in a high dimensional feature space • Define a feature map approximating the kernel

Explicit feature maps • Feature maps for RBF/multiplicative kernels • [Rahmi and Recht, NIPS 07] • [ F. Li et. al DAGM 2010] • Feature maps for additive kernels • [Maji and Berg, ICCV 09] • [Vedaldi and Zisserman, CVPR 2010] • [Perronin, et. al CVPR 2010] Our Contribution • Feature maps for generalized RBF kernels • 2X to 3X speedup (only a little drop in performance)

Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results

Additive kernels Examples: Hellinger’s, , Intersection kernel

Additive Kernel Maps Feature maps for additive kernels [Vedaldi & Zisserman 10]: closed form function approximated by sampling “Efficient Additive Kernels via Explicit Feature Maps”, A. Vedaldi and A. Zisserman, CVPR 2010

Feature maps for RBF kernels Random Fourier features [Rahimi & Recht 07] “Random Features for Large-Scale Kernel Machines”, Ali Rahimi, Ben Recht NIPS 2007

Sreekanth Vempati ( 200402044 ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ),