ECE 562 Computer Architecture and Design Project: Improving Feature Extraction Using SIFT on GPU

ECE 562 Computer Architecture and Design Project: Improving Feature Extraction Using SIFT on GPU Rodrigo Savage, Wo-Tak Wu

Overview • Application: • Object tracking in real time • Challenges: • Static Scene • Moving objects • Occluding • Collision • Disappearing • Rotation • Scaling • Divide and Conquer: • Feature Extraction and Tracking • Focus on: • Feature Extraction, used SIFT • Improve an existing implementation with GPU

Scale Invariant Feature Transform (SIFT) Input: image Output: keypoints

GPU Implementation • Selected the GPU implementation by Sinha et al. at UNC at Chapel Hill • Open-source SiftGPU available (latest V4.00, Sept. 2012) • SIFT well suited to be implemented on GPU • Tens of thousands of threads handle subsets of data without communication with each other

Attempts to Speed Up • Tackled the 2 most time consuming processing steps • Blurring images with Gaussian low-pass filter • Changed pixel data access pattern • Used different schemes of data partitioning • Keypoint descriptor (128-element vector) calculations • Optimize code in the kernel • Used usual optimization techniques • Changed GPU memory usage • Threads management • Experimented with kernel parameters • Maximized usage of available threads Result: Reduced descriptor compute time from 73 to 22 ms (70%)

Conclusion • Existing implementation is already pretty good • Hard to take full advantage of the architecture. Need to have good understanding of • Memory architecture • Thread usage • CUDA C/C++ compiler (nvcc) optimizes code in different ways. Need to experiment to gain performance • Hard to debug code running on GPU • Visual Profiler can provide valuable insights on code behaviors

Backup Slides

References • SiftGPU available at http://cs.unc.edu/~ccwu/siftgpu/ • D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, November 2004. • Sudipta N. Sinha et al., “GPU-based Video Feature Tracking And Matching,” Technical Report TR 06-012, Department of Computer Science, UNC Chapel Hill, May 2006. • NVIDIA GeForce GT 640M LE • CUDA Cores: 384 • Total available graphics memory: 4095 MB

Test image with keypoints

Algorithm

ECE 562 Computer Architecture and Design Project: Improving Feature Extraction Using SIFT on GPU