Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Performance Tuning on Multicore Systems forFeature Matching within Image Collections XiaoxinTang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung Leung and MinyiGuo* Department of Computer Science University of Otago, New Zealand * Department of Computer Science Shanghai Jiao Tong University, China

Contents • Motivation • Our work • Evaluation • Conclusion

Similarity Search • Definition: • To preprocess a database of N objects so that given a query object, one can effectively determine its nearest neighbors in database. • Applications: • pattern recognition, chemical similarity analysis, and statistical classification, etc.

The problem – KNN Search • K Nearest Neighbor Search: • Feature: an array of D elements • f = [e1] • Feature Space: a set of features • Fs= {f1} • Feature Similarity: Euclidean distance • =sqrt(Σ(fim-fjm)2) • Search: given a query feature fq, find k features in Fs so that they have the shortest distances to fq.

Our Case Study • Feature Matching: a fundamental problem in many computer vision tasks • Use the SIFT algorithm to generate features for each image; • Use a k-Nearest Neighbors (k-NN) algorithm to find similar features between images

Challenges • Very time-consuming: • datasets become larger: • hundreds or thousands of images; • image resolution increases: • 2300×1500 pixels, or higher; • New platforms: • HPC turns to multi-/many-core age: • AMD 16-core and 64-core machines.

Motivation • Performance evaluation: • Find out common problems that may limit the performance of feature matching on multi-/many-core platforms. • Performance tuning: • Find general methods to solve the identified problems.

Data Distribution

Data Size

Problems • Unbalanced workload: • Levels of parallelism; • Scheduling policy. • Poor last-level cache utilization: • Memory architecture.

Level_1&2 Level_2 Level_3 Level_4 Level_1 Levels of parallelism Linear KD-tree Kmeans LSH Others ——————— …….. …….. Reference Images Features Query Images

Scheduling policy • OpenMP scheduling policy: • Static: the scheduler will assign an equal number of tasks to each thread (not used); • Dynamic: when one thread finishes its current task, it will take new tasks from the global task queue; • Guided: chunk size is adjusted dynamically when tasks are requested from the task queue.

Memory architecture • More cores are sharing the memory and last-level cache: • Memory bandwidth: • AMD 16-core 12.8 GB/s • AMD 64-core 25.6 GB/s • Last-level cache: • AMD 16-core 6 MB • AMD 64-core 16 MB • Large images may not fit in cache and will cause many memory accesses, which leads to hitting the memory wall.

Divide-and-Merge • We propose Divide-and-Merge: • Whole feature space is split into several smaller sub-spaces; • Search each sub-space independently; • Merge their results.

Divide-and-Merge

Time complexity • Accurate algorithms: • Brute force: • Apply DM: • Approximate algorithms: • Randomized KD-Tree: • Apply DM:

Hardware and Software configuration • Environment: • OpenCV + OpenMP: one of the most frequently used setup for computer vision researchers to utilize parallel platforms

Levels of parallelism

Scheduling policy(on level_1&2)

Scheduling policy(on level_3)

Memory architecture 1. Original Execution 2. Apply Divide-and-Merge

Evaluation on Manawatu Dataset

Conclusion • We have shown that performance tuning is demanding on modern multicore systems. • We have comprehensively evaluated the impact of the three factors that have an influence on large-scale image feature matching. • We have proposed a Divide-and-Merge algorithm that can greatly improve the speedup and scalability of feature matching algorithms on multicore machines.

Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Presentation Transcript

Performance of Windows Multicore Systems on Threading and MPI

Performance Measurements of CCR and MPI on Multicore Systems

performance tuning service for

Feature Matching

Parallel Multidimensional Scaling Performance on Multicore Systems

Learning Image Similarities via Probabilistic Feature Matching

Feature Collections Subsetting

Feature matching

Feature Point Matching

Feature matching

Multicore Systems

Feature Point Matching

A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING

Tuning Systems for TCP/IP Performance on High Latency Networks

Image Reconstruction on Multicore Processors

Performance Optimizations for NUMA-Multicore Systems

Tuning DiFX2 for performance

Tuning Systems for TCP/IP Performance on High Latency Networks

Feature matching

Auto-tuning Performance on Multicore Computers

Auto-tuning Memory Intensive Kernels for Multicore

Image matching