280 likes | 378 Views
Performance Tuning on Multicore Systems for Feature Matching within Image Collections. Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang , Kai-Cheung Leung and Minyi Guo * Department of Computer Science University of Otago , New Zealand * Department of Computer Science
E N D
Performance Tuning on Multicore Systems forFeature Matching within Image Collections XiaoxinTang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung Leung and MinyiGuo* Department of Computer Science University of Otago, New Zealand * Department of Computer Science Shanghai Jiao Tong University, China
Contents • Motivation • Our work • Evaluation • Conclusion
Contents • Motivation • Our work • Evaluation • Conclusion
Similarity Search • Definition: • To preprocess a database of N objects so that given a query object, one can effectively determine its nearest neighbors in database. • Applications: • pattern recognition, chemical similarity analysis, and statistical classification, etc.
The problem – KNN Search • K Nearest Neighbor Search: • Feature: an array of D elements • f = [e1] • Feature Space: a set of features • Fs= {f1} • Feature Similarity: Euclidean distance • =sqrt(Σ(fim-fjm)2) • Search: given a query feature fq, find k features in Fs so that they have the shortest distances to fq.
Our Case Study • Feature Matching: a fundamental problem in many computer vision tasks • Use the SIFT algorithm to generate features for each image; • Use a k-Nearest Neighbors (k-NN) algorithm to find similar features between images
Challenges • Very time-consuming: • datasets become larger: • hundreds or thousands of images; • image resolution increases: • 2300×1500 pixels, or higher; • New platforms: • HPC turns to multi-/many-core age: • AMD 16-core and 64-core machines.
Motivation • Performance evaluation: • Find out common problems that may limit the performance of feature matching on multi-/many-core platforms. • Performance tuning: • Find general methods to solve the identified problems.
Contents • Motivation • Our work • Evaluation • Conclusion
Problems • Unbalanced workload: • Levels of parallelism; • Scheduling policy. • Poor last-level cache utilization: • Memory architecture.
Level_1&2 Level_2 Level_3 Level_4 Level_1 Levels of parallelism Linear KD-tree Kmeans LSH Others ——————— …….. …….. Reference Images Features Query Images
Scheduling policy • OpenMP scheduling policy: • Static: the scheduler will assign an equal number of tasks to each thread (not used); • Dynamic: when one thread finishes its current task, it will take new tasks from the global task queue; • Guided: chunk size is adjusted dynamically when tasks are requested from the task queue.
Memory architecture • More cores are sharing the memory and last-level cache: • Memory bandwidth: • AMD 16-core 12.8 GB/s • AMD 64-core 25.6 GB/s • Last-level cache: • AMD 16-core 6 MB • AMD 64-core 16 MB • Large images may not fit in cache and will cause many memory accesses, which leads to hitting the memory wall.
Divide-and-Merge • We propose Divide-and-Merge: • Whole feature space is split into several smaller sub-spaces; • Search each sub-space independently; • Merge their results.
Time complexity • Accurate algorithms: • Brute force: • Apply DM: • Approximate algorithms: • Randomized KD-Tree: • Apply DM:
Contents • Motivation • Our work • Evaluation • Conclusion
Hardware and Software configuration • Environment: • OpenCV + OpenMP: one of the most frequently used setup for computer vision researchers to utilize parallel platforms
Memory architecture 1. Original Execution 2. Apply Divide-and-Merge
Contents • Motivation • Our work • Evaluation • Conclusion
Conclusion • We have shown that performance tuning is demanding on modern multicore systems. • We have comprehensively evaluated the impact of the three factors that have an influence on large-scale image feature matching. • We have proposed a Divide-and-Merge algorithm that can greatly improve the speedup and scalability of feature matching algorithms on multicore machines.