1 / 15

Implementation of String Match Algorithm BMH on GPU Using CUDA

Implementation of String Match Algorithm BMH on GPU Using CUDA. Author : Junrui Zhou, Hong An, Xiaomei Li, Min Xu , and Wei Zhou Publisher : ESEP 2011 Presenter: Yu Hao , Tseng Date : 2013/7/31. Outline. Introduction Related Work Implementation on GPU using CUDA

orea
Download Presentation

Implementation of String Match Algorithm BMH on GPU Using CUDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation of String Match Algorithm BMH on GPUUsing CUDA Author: Junrui Zhou, Hong An, Xiaomei Li, Min Xu, and Wei Zhou Publisher: ESEP 2011 Presenter: Yu Hao, Tseng Date: 2013/7/31

  2. Outline • Introduction • Related Work • Implementation on GPU using CUDA • Experiment and Result • Conclusion

  3. Introduction • The Boyer-Moore-Horspool algorithm was chosen since it involves sequential accesses to the global memory, which can cut down the overhead of memory access as well as this algorithm is more effective than some other string match algorithm. • To exploit the performance of applications implemented on GPU, how to use the memory on GPU and transform the structure of the algorithm should be firstly taken into account.

  4. Related Work • BMH serial algorithm • Example : • Pattern : gcagagag • Shift Table :

  5. Implementation on GPU using CUDA • Store Strategy • Text • The pattern and skiparrays are transferred to constant Memory inside GPU to reduce the access latency.

  6. Implementation on GPU using CUDA (Cont.) • Kernel of BMH algorithm on GPU • SM_size = N / B_num + (M - 1) • T_size = SM_size / B_size + (M – 1)

  7. Implementation on GPU using CUDA (Cont.) • Bank-conflict free solution

  8. Implementation on GPU using CUDA (Cont.) • Global memory access optimization

  9. Implementation on GPU using CUDA (Cont.) • Global memory access optimization • Contiguous access • Non-Contiguous access ............................................... 1 ............................................... 1 2 2 3 3 N N Global Memory Shared Memory Global Memory 1 1 ……………………………………………………………………………………… ……………………………………………………………………………………… 2 2 3 3 N N Shared Memory

  10. Implementation on GPU using CUDA (Cont.) • Elimination of if-branch in kernel • As we know, the mechanism of GPU processing if-branch is to execute each thread of one half-warp one by one serially. No doubt that manner cripples the concurrency of the kernel.

  11. Experiment and Result

  12. Experiment and Result (Cont.)

  13. Experiment and Result (Cont.)

  14. Experiment and Result (Cont.)

  15. Conclusion • The parallel implementation of the algorithms is at least 40 times faster than the serial implementation. • The hardware must be as fully utilized as possible.

More Related