650 likes | 661 Views
Efficient SVM based object classification and detection. Sreekanth Vempati ( 200402044 ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ), Dr. Andrew Zisserman ( Univ. of Oxford ). Large Visual Data. Cheap capturing, storage and internet devices. Rapid. Video sharing.
E N D
Efficient SVM based object classification and detection SreekanthVempati (200402044) Advisors: Dr. C. V. Jawahar (IIIT Hyderabad), Dr. Andrew Zisserman (Univ. of Oxford)
Large Visual Data Cheap capturing, storage and internet devices
Rapid Video sharing Rapid growth in the amount of data available In the case of youtube Image sharing
Problems • Object Detection • Find the location of specified categories of scenes/objects • Scene/Object Classification • Find specified categories of scenes/objects Is there a a demonstration/protest in this image? Is there a bus in this image? Output the bounding box of the bus in this image
Challenges Intra class variations Ex: Boat/Ship category Inter class similarity Flowers Cityscape Protest
Challenges View Point variation Occlusions/Truncations
Scalability • We need solutions which can be scalable to large amount of data • For example, if we have to test 1,40,000 images • For best performance • Feature representation (Visual words based) • 6300 dimensions • takes ~50 seconds ->total time would be ~57 days • Classification (SVM with non-linear kernel) • 20 classes • 3 images/second, a total time of ~ 10 days
Overview • Large scale semantic concept retrieval in videos • Modeling subcategories • Efficient detection by using GRBF feature maps • Conclusions
1. Semantic video retrieval • Given a large set of videos, retrieve the videos of specific category • Ex: Find all the videos containing soccer
Overview of the approach Annotated Video Frames Example Videos Feature Extraction Ex: PHOW, PHOG, GIST Training Testing Feature Extraction Classifier Ex: SVM, Random Forests Ranked Shots Unseen Videos
Features • GIST – Torralba et. al IJCV 01 • Image divided into m x m grid • For each cell, a set of filters (different scales, orientations) are applied • Final descriptor: Average of the filter responses over all blocks Images from “Image Classification for large number of object categories”, Anna Bosch, 2006
Features Pyramid Histogram of Oriented Gradients Images from “Image Classification for large number of object categories”, Anna Bosch, 2006
Pyramid Histogram of Visual Words Using dense SIFT descriptors Scale Invariant Feature Transform Vector Quantization “Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories.”, S. Lazebnik et. al CVPR 2006
Support Vector Machines (SVM) Xi i = 1,..…..,N yi i = 1,……,N Misclassified point < 1 b Support Vector Support Vector = 0 w wt(x) + b = -1 wt(x) + b = 0 wt(x) + b = +1
SVM formulation Evaluation function f(x) = wtx + b
Kernel Trick • Use a function which maps input space to feature space. • And then build the classifier in feature space.
Moving to different space Dot product in feature space f(x) = wtx + b = i iyi<(xi) , (x) >+ b
Kernelizing SVMs Replace it with kernel function
Linear : • Polynomial : • Intersection kernel • Generalized RBF kernel : • Weighted combination of multiple kernels Kernels
TRECVID competition • Objective : Rank video shots based on the presence of given concept • Participated in High level feature extraction, TRECVID • Organized by NIST, USA • 2008: around 180 submissions by 40 teams from all over the world
Some of the classes • High-level Feature Extraction • Cityscape • Classroom • Driver • Two People • Emergency Vehicle • Harbor • Kitchen • Nighttime • Singing • Demonstration/Protest • Mountain • Hand • Street • Telephone • Flower • Bridge • Airplane flying • Boat/Ship • Bus • Dog
Data Statistics Evaluation Measure • Average Precision • - Area under Precision-Recall curve
Our Approach • Performance compared using different features and SVM parameters • Use of PHOW with Intersection kernel is efficient • Testing is very fast, with little drop in performance Testing time: ~2lakh frames in 10 seconds “Classification using Intersection kernel SVMs is efficient”, A. Berg et. al, CVPR 2009
Results More Results
1. Summary • Method of visual concept retrieval suitable for large scale data • PHOW with fast intersection kernel is very much useful
Structural SVM vs SVM • Allows the output label to be a complex variable - Joint feature map between input and output • Our case: Use as a combination of category and • subcategory labels “Support Vector Learning for Interdependent and Structured Output Spaces”, I. Tsochantaridis, , et. al ICML 04
Use of latent variables “Learning structural SVMs with latent variables”, C. N. Yu et. al ICML 2009
Real world datasets • TRECVID 2009 dataset • PASCAL VOC (Visual Object Categorization) 2007 • Object Detection dataset
2. Summary • Method for modeling of subcategories using structural SVM • Application of latent structural SVM for further improvements • Improved the performance of linear kernel • Performed various experiments on toy and real data
Object Detection aeroplane bicycle cow car horse motorbike
Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results
General Framework for detection Any Image Ex: Car Classifier (Ex: SVM ) Feature representations Non-linear SVM Linear SVM “Multiple Kernel Learning for Object Detection”, Vedaldi et. al, ICCV 2009, “Cascade Object Detection with Deformable Part Models”, Felzenszwalb et. al, CVPR 2010,
Kernels • Fast Linear SVMs • Stochastic SVM (PEGASOS) • Primal SVM (liblinear) • One-slack SVM (SVM-perf) • Linear SVM • Additive kernels • Generalized RBF kernels faster Ex: intersection Kernel Ex: exp- kernel more discriminative
Kernels Problem: GRBF kernels with high computational complexity are required to get good performance Our Solution: Approximate Generalized RBF kernels with a linear one by using a feature map
Speeding up non-linear SVMs • A kernel is a dot product in a high dimensional feature space • Define a feature map approximating the kernel
Explicit feature maps • Feature maps for RBF/multiplicative kernels • [Rahmi and Recht, NIPS 07] • [ F. Li et. al DAGM 2010] • Feature maps for additive kernels • [Maji and Berg, ICCV 09] • [Vedaldi and Zisserman, CVPR 2010] • [Perronin, et. al CVPR 2010] Our Contribution • Feature maps for generalized RBF kernels • 2X to 3X speedup (only a little drop in performance)
Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results
Additive kernels Examples: Hellinger’s, , Intersection kernel
Additive Kernel Maps Feature maps for additive kernels [Vedaldi & Zisserman 10]: closed form function approximated by sampling “Efficient Additive Kernels via Explicit Feature Maps”, A. Vedaldi and A. Zisserman, CVPR 2010
Feature maps for RBF kernels Random Fourier features [Rahimi & Recht 07] “Random Features for Large-Scale Kernel Machines”, Ali Rahimi, Ben Recht NIPS 2007