430 likes | 662 Views
Empowering visual categorization with the GPU. Present by 陳群元. outline. Introduction Overview of visual categorization Image feature extraction Category model learning Test image classification GPU accelerated categorization Experimental setup Results . introduction.
E N D
Empowering visual categorization with the GPU Present by 陳群元 我是強壯!
outline • Introduction • Overview of visual categorization • Image feature extraction • Category model learning • Test image classification • GPU accelerated categorization • Experimental setup • Results 我是強壯!
introduction • Use GPU accelerate the quantization and classification components of a visual categorization architecture • The algorithms and their implementations should push the state-of-the-art in categorization accuracy. • Visual categorization must be decomposable into components to locate bottlenecks. • Given the same input, implementations of a component on various hardware architectures must give the same output. 我是強壯!
overview 我是強壯!
Visual categorization system • Image Feature Extraction • Point Sampling Strategy • Descriptor Computation • Bag-of-Words • Category Model Learning • Test Image Classification 我是強壯!
Visual categorization system • Image Feature Extraction • Point Sampling Strategy • Descriptor Computation • Bag-of-Words • Category Model Learning • Test Image Classification 我是強壯!
Point sampling strategy • Dense sampling • Typically, around10,000 points are sampled per image • Salient point method • Harris-Laplace salient point detector [29] • Difference-of-Gaussians detector [28] 我是強壯!
Visual categorization system • Image Feature Extraction • Point Sampling Strategy • Descriptor Computation • Bag-of-Words • Category Model Learning • Test Image Classification 我是強壯!
Descriptors • SIFT descriptor ->128 dim • 10 frames per second for 640x480 images(GPU) • SURF descriptor • 100 frames per second for 640x480 images(GPU) • ColorSIFT descriptor ->384 dim • Triple of SIFT 我是強壯!
Visual categorization system • Image Feature Extraction • Point Sampling Strategy • Descriptor Computation • Bag-of-Words • Category Model Learning • Test Image Classification 我是強壯!
Bag-of-words • Vector quantization is computationally the most expensive part of the bag-of-words model. • Bag -> imagesset • Words->features 我是強壯!
Bag-of-words • N descriptors of length d in an image • codebook with m elements • O(ndm) per image • A tree-based codebook • O(nd log(m))->real-time on the GPU [25]. 我是強壯!
Visual categorization system • Image Feature Extraction • Point Sampling Strategy • Descriptor Computation • Bag-of-Words • Category Model Learning • Test Image Classification 我是強壯!
Category model learning • precomputekernel function values • kernel-based SVM algorithm 我是強壯!
Support Vector Machines • Kernel Support Vector Machines 我是強壯!
Visual categorization system • Image Feature Extraction • Point Sampling Strategy • Descriptor Computation • Bag-of-Words • Category Model Learning • Test Image Classification 我是強壯!
outline • Introduction • Overview of visual categorization • Image feature extraction • Category model learning • Test image classification • GPU accelerated categorization • Parallel Programming on the GPU and CPU • GPU-Accelerated Vector Quantization • GPU-Accelerated Kernel Value Precomputation • Experimental setup • Results 我是強壯!
Parallel Programming on the GPU and CPU • SIMDinstructions perform the same operation on multiple dataelements at the same time 我是強壯!
GPU-Accelerated Vector Quantization • The most expensive computational step in vector quantizationis the calculation of the distance matrix.(n*m) • A:n*d • matrix with all image descriptors as rows • B:m*d • matrix with all codebook elements as rows 我是強壯!
GPU-Accelerated Vector Quantization(cont.) • Compute the dot products between all rows ofA and B (line 7). • matrix multiplications are the building blockfor many algorithmshighly optimized BLAS linear algebra libraries containing this operation exist for both the CPU and the GPU. 我是強壯!
GPU-Accelerated Kernel Value Precomputation • To compute kernel function values, we use the kernel function based on the distance • distance between feature vectors F and F’ • kernel function based on this distance 我是強壯!
GPU-Accelerated Kernel Value Precomputation(cont.) • multiple input features • For kernel value precomputation, memory usage is an important problem. • for a dataset with 50, 000 images, the input data is 12 GB and the output data is 19 GB • to avoid holding all data in memory simultaneously. We divide the processing into evenly sized chunks.(1024*1024) 我是強壯!
EXPERIMENTAL SETUP • Experiment 1: Vector Quantization Speed • CPU implementation is SIMD-optimized. • codebook of size m = 4, 000 • 20, 000 descriptors per image • descriptor lengths of d = 128 (SIFT) and d = 384 (ColorSIFT). • Experiment 2: Kernel Value PrecomputationSpeed • chosen the large MediamillChallenge training set of 30, 993 frames • Experiment 3: Visual Categorization Throughput • comparison is made between the quad-core Core i7 920 CPU (2.66GHz) and the GeforeGTX260 GPU (27 cores). 我是強壯!
Results • Experiment 1: Vector Quantization Speed • Experiment 2: Kernel Value PrecomputationSpeed • Experiment 3: Visual Categorization Throughput 我是強壯!
Results • Experiment 1: Vector Quantization Speed • Experiment 2: Kernel Value PrecomputationSpeed • Experiment 3: Visual Categorization Throughput 我是強壯!
Results • Experiment 1: Vector Quantization Speed • Experiment 2: Kernel Value PrecomputationSpeed • Experiment 3: Visual Categorization Throughput 我是強壯!
Results • Experiment 1: Vector Quantization Speed • Experiment 2: Kernel Value PrecomputationSpeed • Experiment 3: Visual Categorization Throughput 我是強壯!
Other applications • Application 1: k-means Clustering • Application 2: Bag-of-Words Model for Text Retrieval • Application 3: Multi-Frame Processing for Video Retrieval 我是強壯!
Conclusions • This paper provides an efficiency analysis of a state-of-the art visual categorization pipeline based on the bag-of-words model. • two large bottlenecks were identified: the vector quantization step in the image feature extraction and the kernel value computation in the category classification • Compared to a multi-threaded CPU implementation on a quad-core CPU, the GPU is 4.8 times faster. 我是強壯!
The end • Thank you! 我是強壯!