130 likes | 289 Views
Implementing Memory & Run Time Efficient Image Texture Classification using NVIDIA GPU. SHREYAS PARNERKAR. Motivation.
E N D
Implementing Memory & Run Time Efficient Image Texture Classification using NVIDIA GPU SHREYAS PARNERKAR
Motivation • Texture analysis is important in many applications of computer image analysis for classification or segmentation of images based on local spatial variations of intensity or color. • Applications include industrial and biomedical surface inspection, for example for defects and disease, segmentation of satellite or aerial imagery, segmentation of textured regions in document analysis. • Most texture classification methods derive features based on output of large filter banks (13 – 48 dimensional feature space).
Motivation • Tuzel et al. use image intensities and first and second order derivatives of intensities in both x and y direction for texture classification which results in a 5 dimensional feature space. • These features are used to calculate co-variance matrices using Integral images (P & Q). • Calculation of integral images is computationally intensive because of highly nested loops.
Dependence Graph ROWS COLUMNS
GPU Utilization concerns • Such scheduling results in a maximum of W or H elements to be executed in parallel. • But at other instances, it is always less than the maximum. • GPU utilization drops down resulting in slow-down since plenty of threads are idle. • Such scheduling is hence not good for GPU implementation.
Memory Concerns • Shared Memory Limited to 4kB . Cannot put entire image in shared memory. • Global memory is slow compared to shared memory. • Uploading entire image in global memory causes interference with the graphic display (??). • Put just the required data in shared memory. • Required data can be entire image.
Updated Dependence Graph ROWS ROWS COLUMNS COLUMNS + =
Results CPU Over-Head
In Conclusion… • Implement parallel reduction for even more speed up. (In progress) • Use calculated P-Q integral images to calculate covariance. ( Can be done on CPU ) • Read Data from actual images (Currently sample random data is generated). • Compare Memory Usage for CPU vs GPU implementation.